![]() There is one more configuration which I need to add to the core-site.xml to fix it: Well, no wonder – I have changed the default file system configuration and have not said anything to the cluster regarding how I want HDFS to be treated. However, If I will try to launch the Hadoop Name Node console, browser fails to connect to the Head Node. Now, I also want keep some data in the HDFS file system. ![]() You can also try and use something like Azure Storage Explorer to see if the data were copied into the Azure Blob. Now, let’s restart the Hadoop cluster and try to copy something from local folder into the asv:// path:į:\hdp\hadoop>hadoop fs -copyFromLocal c:/Test asv://ĭrwxr-xr-x – HDPAdmin supergroup 0 18:18 /Testĭrwxr-xr-x – hadoop supergroup 0 10:04 /mapred Now, the next property doesn’t exist in the core-site XML – you need to add it manually:įs. First, I will change “default file system” config already mentioned .net Let’s add this information to the configuration file. In order to successfully communicate with the Azure Storage Hadoop needs to know: In my solution I want ASV be my default file system, so I need to change this part of the core-site.xml to re-point Hadoop to the Azure Storage. Either the literal string "local" or a host:port for NDFS.Ĭurrently we have only the default HDFS file system. So, assuming you have the HDP cluster (either in the Cloud or on-prem) up and running and Azure Storage Account created, let’s try to make these two technologies friends.įollowing Cindy’s recommendation let’s look at the core-site.xml: If you are going to use it – you do it at your own risk!Īt very early stages of HDI validations Cindy Gross has published instructions on how to connect Azure Blob storage ( asv – Azure Storage Vault) to HDI cluster. What to do if I need more?ĭisclaimer: following configuration works, but officially not yet supported by neither Microsoft nor Hortonworks. The largest Azure VM can support up to x16 drives 1 TB each, so you can easily get 16TB of data stored in your Hadoop cluster. ![]() If you look at the HDP cluster in Azure VMs – it is also using Azure Storage: all the VHD drives, which you as Data drives in your VM stored in Azure Blob storage. The Azure Storage is cheap and scales very well. What is interesting for me at the moment – HDI relies on the Azure Storage for saving data. I am not going to focus on those questions here. If deciding which one of two to select, even considering the same code base underneath of both solutions, there is a number of factors, which may influence the final choice, however the key is _how_ the cluster is going to be used: do you need it for temporary calculations, do you have dependency on additional Hadoop related tools and therefore dependency on Java etc. The Hortonworks HDP running on the IaaS VMs is another option you may want consider. If you are building Cloud based BigData solution, HDInsight cluster in Windows Azure could be one of the first choices when looking at the possible platforms.
0 Comments
Leave a Reply. |