I am closely watching movements in the big data for around 2 years. I tried to learn more about processing huge data set when I was writing a series about artificial intelligence. As any other developer, at one point I started feeling that we need more power to process big data sets which cannot be given by one machine. Even if the machine has the power, it cannot hold that much large data set in it's hard drive. It leads to thoughts of distributed computing. The problem is that, it is known from the college days but no idea how to implement or how to see real distributed processing in action?
The research often ends up in 'Google search indexing and how they process huge volume of internet data'. Since their techniques are secret, the only one way is Hadoop which is again a technique unveiled by them.
The search for getting the password of hadoop user ended in a file located at C:\Hadoop\singlenodecreds.xml. It shows an encrypted value instead of actual password. So how to decrypt it?
Getting the password solved the problem? Absolutely no. I was able to set the credentials but got the same error
I tried installing original Apache Hadoop but not able to successfully execute a Map Reduce task in it. As I am working in a delivery focused company, the time available for the research activities is very less. Months later, I saw the Microsoft implementation of Hadoop in Azure. There was again a bottle neck. Even though it was free, it needs a valid credit card number. Due to some self enforced rules, I decided not to use credit cards. Time passed away, Microsoft released HDInsight to windows servers and desktops. I became happy. But there was a simple limitation. It requires Win7 64bit machine to install HDInsight. I was using 32bit Vista at that time. To get Win7 64 only for me, in the off shore team was a big task. After getting approvals after approvals, finally I got my machine upgraded to Win7 64bit. Immediately I installed HDInsight. I checked all the Apache services, dashboard, HDFS & Hadoop command line.
Again the busy project release time came. One project after another. The installation remained there. Recently things settled. The projects I am handling are in clam state and its the time for me to explore.
Hadoop installation issues
After this long time, I tried to open the Hadoop dashboard web site. It simply says "Internal Server Error". I checked the app pool and the corresponding HadoopAppPool is not running. When I tried to run it, I got an error in the event viewer related to authentication of .\Hadoop user which is running the app pool. I thought that the password might be expired. There is no chance for change in the password. Sometimes if we set the credentials again it may work. But what is the password of local Hadoop user?
I looked into the services.msc for all Apache services and could see all are stopped. I tried to start but failed due to authentication issues. All the services are by default installed to run using .\hadoop service account which is created when we install hadoop. Finding the password of local Hadoop user became essential now. Below is the error which I could see in event viewer.
Windows could not
start the Apache Hadoop hiveserver2 service on Local Computer.
Error 1069: The
service did not start due to a logon failure.
Finding the password of local Hadoop user
$xml = Import-Clixml -Path "C:\Hadoop\singlenodecreds.xml" $psCred = New-Object System.Management.Automation.PSCredential $import.Username, ($xml.Password | ConvertTo-SecureString) Write-Host $psCred.GetNetworkCredential().Password
The above PowerShell will do it. If it can be retrieved in this way, why Microsoft encrypted it in the file? Only HDInsight team knows? May be when its installed in multiple machines, it makes sense.
Getting the password solved the problem? Absolutely no. I was able to set the credentials but got the same error
Reinstalling Hadoop
Now its the time to reinstall Hadoop. Before that I just googled for similar issues and most of them resolved the issue by either formatting the machine or installing it in a fresh virtual machine. I uninstalled and installed hadoop using Microsoft Web Platform Installer. This time all the services are running. But I cannot see the Hadoop dashboard and WebAPI applications in IIS.
Some people says that, if the hadoop dashboard is not coming we can install it by executing the below PowerShell script.
cd C:\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools .\winpkg.ps1 ..\Packages\HadoopWebApi-winpkg.zip install -CredentialFilePath c:\Hadoop\Singenodecreds.xml .\winpkg.ps1 ..\Packages\HadoopDashboard-winpkg.zip install -CredentialFilePath c:\Hadoop\Singenodecreds.xml
In my case I was not able to find both HadoopWebApi-winpkg.zip & HadoopDashboard-winpkg.zip in C:\HadoopFeaturePackSetup\HadoopFeaturePackSetupTools folder. Is my company firewall blocking these files from getting downloaded? Seems this is the time to get a deep dive into Web Platform Installer.
The internals of web platform installer needs a full post.So let me summarize. I could see that the web platform installer is requesting http://www.microsoft.com/web/webpi/4.5/toolsproductlist.xml to get the details about HDInsight installer. The Id of HDInsight in the xml file is 'HDINSIGHT-PREVIEW'. When I opened the xml file I could see that the size of installer is 30.300 MB, but when I manually download the file from http://go.microsoft.com/fwlink/?LinkId=269735 (Its also mentioned in xml)its around 3.800MB .May be the downloaded file MSSingleNodeInstaller.msi is corrupted or there is some firewall issue. But if firewall is playing in between I cannot touch the file itself.. This is something problematic and is not under my control. So I posted in MSDN and Stackoverflow. Then started refreshing :)
Do I really need working Hadoop dashboard to work with Hadoop
Now answers for couple of hours. May be most of the Hadoop experts are in the eastern side of the earth and as I am working from India, I need to wait till they get up. But time is money. I thought of taking a chance. Why can't try executing a Hadoop Job?
The reasons behind this is,
- Microsoft created HDinsight on top of Apache Hadoop.
- Apache Hadoop is working. At least all the services are running.
- Dashboard is created in ASP.Net which may be just a portal for showing the results. I can see the results by using command line.There are more developers out there who are using Hadoop without ASP.Net dashboard.
I started coding my hadoop job using Visual studio and executed it.Surprisingly, I didn't get any errors which says 'I occurred due to lack of Hadoop dashboard or HadoopWeb API". But obviously there were errors which I can explain in next post.
What were the issues
The root issue is 'I didn't read the HDInsight release notes'
- The password of Hadoop user will expire and the release note has mentioned the same with a script to correct it.
- The dashboard is removed from latest installation which is done via web platform installer.This also mentioned in the release notes.If we need to get details of result and status of map reduce jobs we need to use the PowerShell commands.
- The size of the MSSingleNodeInstaller.msi file is correct in 4.6 version of WebPI http://www.microsoft.com/web/webpi/4.6/toolsproductlist.xml
More links to installation
- http://blog.paxcel.net/blog/hdinsight-installation-on-windows-platform/
- http://blogs.msdn.com/b/avkashchauhan/archive/2012/11/03/hadoop-adventurs-with-microsoft-hdinsight.aspx
- http://parasdoshi.com/2012/10/27/getting-started-with-hdinsight-a-k-a-microsofts-big-data-hadoop-platform-on-local-windows-machine/
- http://www.amazedsaint.com/2013/03/taming-big-data-with-c-using-hadoop-on.html
Note - The above articles are written when there was dashboard. Dashboard is now removed.Also you need a 64bit machine to install Hadoop.
3 comments:
Thanks. Great post.
Thanks. Great post.
Submit your website or blog now for appearing in Google and 300+ search engines!
Over 200,000 websites submitted!
SUBMIT RIGHT NOW via I NEED HITS!
Post a Comment