Install Hadoop On Windows Without Cygwin Commands

Installing Cygwin. After installing the prerequisite software, the next step is to install the Cygwin environment. Cygwin is a set of Unix packages ported to Microsoft Windows. It is needed to run the scripts supplied with Hadoop because they are all written for the Unix platform. To install the cygwin environment follow these steps.

Training Related

Placements
Glossary
Testimonials
FAQ
Payments Options
Quality Management

Our Courses

Mobile Apps
Telecom Software
Wireless & Telecom
Information Management
Information Technology
Research & Simulation
Academic Projects & Internship

This is a detailed step-by-step guide for installing Hadoop on Windows, Linux or MAC. It's based in Hadoop 1.0.0, which is the current and first official stable version. It's based in version 0.20.0 (note that there was a 0.21.0 version). Installing Hadoop on Linux / MAC is pretty straight forward. However, having it run on Windows can be a bit tricky. You'd probably not run Hadoop on Windows on a productive environment, but it may result convenient as a development environment. If you are using Linux/MAC, just skip Windows information.

Windows installation

Hadoop can be installed on Windows using Cygwin (not inteded for production environments), but there are several Cygwin installation and configuration issues.

Windows: Download and install Cygwin

Cygwin is an implementation of a set of Linux commands and applications for Windows. Download the web installer from: https://cygwin.com/setup.exe and run it.

Installer will request some information before installing:

Installation method. Select 'Install from Internet'.
Root Directory. The default is c:cygwin. Accept this directory.
Local Package Directory (the directory where install files will be downloaded). The default is c:cygwin-packages. Accept this directory.
Connection and download site.
A list of available packages will be displayed. The following packages are missing, so make sure to include them:
- openssh
- openssl
- tcp_wrappers
- diffutils
If several options are listed (eg: openssl) include them all.
Upon installation completion, it will create a Cygwin icon in the Desktop and/or Start menu. Click it to open a Cygwin window.

Configuring SSH on Windows

Hadoop requires SSH (Secure SHell) to be running. To configure it, open a Cygwin window and type:

ssh-host-config

* You can design and customize your training based upon your requirements from above mentioned topics or ask our experts to do it for you

Use the following installation options:

Should privilege separation be used? (yes/no) no
Do you want to install sshd as a service? yes
Enter the value of CYGWIN for the daemon: [] ntsec
If requested for an account name, specify: cyg_server with a password you'll remember.

Eg.

$ ssh-host-config
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Generating /etc/ssh_host_ecdsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file
*** Query: Do you want to install sshd as a service?
*** Query: (Say 'no' if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] ntsec
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.
*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).
*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.
*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.
*** Info: No privileged account could be found.
*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no
*** Query: Create new privileged user account 'cyg_server'? (yes/no) yes
*** Info: Please enter a password for new user cyg_server. Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.
*** Query: Please enter the password:
*** Query: Reenter: Enter password
*** Info: User 'cyg_server' has been created with password '####'.
*** Info: If you change the password, please remember also to change the
*** Info: password for the installed services which use (or will soon use)
*** Info: the 'cyg_server' account.
*** Info: Also keep in mind that the user 'cyg_server' needs read permissions
*** Info: on all users' relevant files for the services running as 'cyg_server3'.
*** Info: In particular, for the sshd server all users' .ssh/authorized_keys
*** Info: files must have appropriate permissions to allow public key
*** Info: authentication. (Re-)running ssh-user-config for each user will set
*** Info: these permissions correctly. [Similar restrictions apply, for
*** Info: instance, for .rhosts files if the rshd server is running, etc].
*** Info: The sshd service has been installed under the 'cyg_server'
*** Info: account. To start the service now, call `net start sshd' or
*** Info: `cygrunsrv -S sshd'. Otherwise, it will start automatically
*** Info: after the next reboot.
*** Info: Host configuration finished. Have fun!

Installation script creates:

configuration files:
- /etc/ssh_config
- /etc/ssh_host_dsa_key
- /etc/ssh_host_ecdsa_key
- /etc/ssh_host_key
- /etc/ssh_host_rsa_key
- /etc/sshd_config
a cyg_server privilleged account.
a sshd Windows service, using the specified account and password, and listed under the name CYGWIN sshd.

IMPORTANT: Do not run ssh-host-config without removing existing files or account. The script changes access permissions on configuration files so that they can only be accessed by ssh services. If the sshd service, configuration files and account are not created together, the script fails to configure the file permissions and no error is reported.Cleaning up ssh

If you run into any issue, delete the above 6 files, remove the created service using:

cygrunsrv -R sshd

and start over.

You should be able to start sshd service and login using your password. However, in order to run Hadoop you need to create a server key, so that you can stablish a ssh session without specifying a password. To this type

ssh-keygen

and accept all default options (no passphrase).

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/AccountName/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/AccountName/.ssh/id_rsa.
Your public key has been saved in /home/AccountName/.ssh/id_rsa.pub.
The key fingerprint is:
9b:51:11:ea:c4:a4:72:fe:70:e7:dd:f1:ea:34:ac:0f AccountName@ServerName
The key's randomart image is:

Copy the generated RSA public key into the authorized_keys file, to allow logging without password.

cd ~/.ssh
cat id_rsa.pub >> authorized_keys

Try connecting locally:

ssh localhost

You should be able to connect without specifying a password.

Configuring SSH on Linux / MAC

When running on Linux / MAC, all you have to do is make sure to have SSH server running and has the certificates, so that no password is requested.

Make sure SSH server is running (ssh localhost). If it's not running, start it:

Linux
- Start it as a server: net start sshd.
  Under Windows, service is created as 'CYGWIN sshd', using '.sshd' account. However this account may not be configured correctly to read server certificates.
- Start it as a process: /usr/sbin/sshd
MAC
Go to System preferences -> Internet & Wireless -> Sharing -> Enable 'Remote login' service.

You should create a server key so that ssh does not request a password every time a session is stablished. Verify that ssh localhost) does not request a user/passphrase. If this happens, create a server key and add it to authorized_keys file.

ssh-keygen -t dsa -P ' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Download Hadoop

Current Hadoop version is 1.0.0. Hadoop is organized as 3 projects:

Common: Common functionality to all projects (logging, utilities, etc).
HDFS: Hadoop Distributed File System.
MapReduce: Map-Reduce implementation. It allows performing distributed queries on the distributed file system. Explained later.

They are downloaded together from https://hadoop.apache.org/ as a single .tar.gz / .rpm / .deb file.

Unpack hadoop to any directory. Recommended install directory is /usr/local/hadoop-1.0.0, but you could use other directories.

Configure Hadoop

There are 3 basic configuration options for Hadoop:

Local (Standalone) Mode: All services run in a single node, with no replication.
Pseudo-Distributed Mode: Services run in a single node, but as separate Java processes.
Fully-Distributed Mode: Real distributed environment.

Hadoop configuration is stored in xml files located in /conf. They all share the same key-value format, stored as a sequence of:

<property>
<name>Property name</name>
<value>Property value</value>
</property>

Pseudo-Distributed Mode is the ideal development mode. Minimum configuration files for pseudo-distributed mode are shown below:

conf/core-site.xml:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

conf/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

conf/mapred-site.xml:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

If no path are specified, Hadoop temporary and data files are located in system tmp directory. So any implementation should begin by defining tmp and hdfs directories, as shown below:

conf/core-site.xml:

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

conf/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/${user.name}/hdfs/name>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/${user.name}/hdfs/data>
</property>
</configuration>

conf/mapred-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Under Windows, specify paths using full format. Eg:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Start hadoop

Format NameNode

Before starting Hadoop, you have to format the Name node. This is the node containing file structure. To format the Name node run:

cd /usr/local/hadoop-1.0.0
./bin/hadoop namenode -format

Several files will be created under the directory defined for the configuration key dfs.name.dir.

Start HDFS

bin/start-dfs.sh;

Check HDFS is running by browsing to: https://localhost:50070/.
A webpage should be displayed with DFS information, where you can view and browse the directory structure.

If you run into any issue, check log files under hadoop-1.0.0/logs/ for errors.

You can also browse the file system using bin/hadoop fs -ls. Type bin/hadoop fs for the complete set of commands.

Under MAC/OSX you might get an 'Unable to load realm info from SCDynamicStore' error. If you run into this issue, add the following line:

export HADOOP_OPTS='-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk'

at the beginning of conf/hadoop-env.sh

Start MapReduce (JobTracker):

bin/start-mapred.sh

Check JobTracker has started by browsing to:: https://localhost:50030/.
A page with scheduled jobs should be displayed.

Check hadoop-1.0.0/logs/ for errors.

Check HDFS and JobTracker by openning:

NameNode: https://localhost:50070/
JobTracker: https://localhost:50030/

Big Data Hadoop Training

Training Mode

Instructor Lead Regular / Instructor Lead Online / Instructor Lead Part Time / Onsite (Corporate)

Training Schedule

For more details related to this course you can directly contact us or you can call us.

Contact us to customize this class with your own dates, times and location. You can also call +91-8826900551, +91-8826900552.

Need course pricing?

Need more information?

Want to join?

Get More Information?

What our clients say

'I've been to a different training facilities for other technologies, and this is one of the few where I've left feeling like I've learned more than I expected.
Eduardo Moreno, USA

Nex-G Mantra
' We look at future, start at the beginning, cut through the jargon, bust the buzzwords, sort out technologies and provide an understanding of mainstream trends and practical cost-effective solutions... without bogging down on technical details'.

Developing skills on next generation technologies since 2002.

Join Us

Running Hadoop on Cygwin in Windows (Single-Node Cluster)

Running Hadoop on cygwin in windows (Single-Node Cluster)

In this document you are going to see how you can setup pseudo-distributed, single-node Hadoop (any stable version 1.0.X) cluster backed by the Hadoop Distributed File System, running on windows ( I am using Windows VISTA). Run your Hadoop cluster through 10 steps

Pre-request

Software’s to be downloaded before you start these procedures. You can download all the recommended software’s before you get started with the steps listed below

Cygwin download for windows
Oracle Java – Java 1.6 (aka Java 6)is recommended download windows executable (.exe) either x64 is 64 bit platform or x86 for 32bit platform
Hadoop1.0.4 ( direct link), Oct 2012 stable release

Single-node Hadoop cluster step by step instruction

1. Installing Cygwin

a) Cygwin comes with a normal setup.exe to install in Windows, but there are a few steps you need to pay attention, I would like to walk you through the step by step installation. Click her to download Cygwin setup

b)Once you start installing the first screen which appears this

SSH Installation

c) After 4 steps from the above screen you will be getting a screen to select packages, in this step you can choose OpenSSH installation along with Cygwin

d)Cygwin installer proceeds with including all dependent packages which are required for the installation.

Now you installed Cygwin with OpenSSH2. Set Environment Variable in Window

a)Find “My Computer” icon either on the desktop, right-click on it and select Properties item from the menu.
b) When you see the Properties dialog box, click on the Environment Variables button which you see under the Advance Tab.
c) When you click Environment Variables dialog shows up, click on the Path variable located in the System Variables box and then click the Edit button.
d) Edit dialog appears append you cygwin path end of the Variable value field

(I installed Cygwin under C: drive – c:cygwinbin;)

3. Setup SSH daemona) Open the Cygwin command prompt.
b) Execute the following command:

c)When asked if privilege separation should be used, answer no.

d)When asked if sshd should be installed as a service, answer yes.

(If it prompts with CYGWIN environment variable, enter ntsec)

4.Start SSH daemon

a)Find My Computer icon either on your desktop, right-click on it and select Manage from the context menu.

b)Open Services and Applications in the left-hand panel then select the Servicesitem.

c)Find the CYGWIN sshd item in the main section and right-click on it.

d)On the property popup you can select “Start up :” Automatic. So that it will start up when windows starts

5.Setup authorization keys

$ ssh-keygen

(Since we are generating keys without password, so press enter. Below is the sequenceof text which appears in the terminal prompt)

$ cd ~/.ssh

(.ssh folder will be under $ <user> directory, eg:- please find he screen shot below .ssh is under my user profile installed in my system is “sunder”)
c)Next step is to create an RSA key pair with an empty password. You have to enable SSH access to your local machine with this newly created key.

To test SSH installed, from a terminal prompt enter:

( You will get a similar notification in the terminal)

1
2	Last login: Mon Apr 8 21:36:45 2013 from sunder-pc

Now you SSH successfully running with keys generated

a)Installing JAVA in windows system is a easy step up step process

b)You can download .exe for Windows installation file from the Oracle JDK download page

c)Choose your JAVA installation folder (eg :- C:Javajdk1.6.0_41) and install JAVA

a)Set environmental variable for JAVA_HOME, as we already did for Cygwin in the above instruction – the same steps to be followed for setting JAVA_HOME

b)You may need to create a new variable under the User Variable / System Variable. Please find the reference screen shot below

a.To set JAVA_HOME in Cgwin have to update Java home directory in /etc/bash.bashrc

b.edit $HOME/.bashrc file to set JAVA home directory

c.Set Java home, you can see export JAVA_HOME line in the file been commented using #. Remove # (uncomment it) and you have to key in by giving your Java installed path

(to recognize your windows folder you have give 2 backward slash”” for each folder, since I installed java under c:javajdk1.6.0_41 in my windows path)

Please Note:

Since you are using Windows you can also edit file through windows explorer whenever you are editing any files inside Cgwin through Windows either with notepad or wordpad, after saving the files in windows ensure you get into Cgywinterminal and locate the file and execute a UNIX command “$ dos2unix <filename>”. This is more important in all stages of execution

e. Exit any terminal and open a new terminal

f.To check the variable is set, type the command in a terminal

(The above command will display the java directory path you se or you can also type $ java –version or simply $ java to see execution of java commands in the terminal )

Now you set environment variable in Cygwin ie. JAVA_HOME

Below step by step instruction will help you to setup a single-node Hadoop cluster. Before we move on know about the HDFS(Hadoop Distributed File System) Architecture Guide

a)Download a recent stable release from one of the Apache Download Mirrors.

b)Download “hadoop-<version>.tar.gz” to your desired directory

c)From the terminal type this command where you download your hadoop-< version>.tar.gz file

d) The above command will extract the hadoop files and folder

e)Once you extracted all the files, you may have to edit few configuration files inside <Hadoop Home> directory

Feel free to edit any file through windows with wordpad but don’t forget to execute the UNIX command “$ dos2unix <filename>” for all the files you open up in windows.

f)Now edit <Hadoop Home>/conf/hadoop-env.sh to set Java home as you did it before for environmental variable setup

(Since I already set my JAVA_HOME in .bashrc so I gave JAVA_HOME=$JAVA_HOME)

g)And then update <Hadoop Home>/conf/core-site.xml. with the below xml tag to setup hadoop file system property

<value>hdfs://localhost:50000</value>

</configuration>

h)Now update<Hadoop Home>/conf/ mapred -site.xml with the below xml tag

<value>localhost:50001</value>

</configuration>

i)Now update<Hadoop Home>/conf/ hdfs -site.xml with the below xml tag

<value>/home/<user>/hadoop-dir/datadir</value>

<value>/home/<user>/hadoop-dir/namedir</value>

</configuration>

Assume you created your directory in your user profile, so user your user name after /home otherwise you can also check your folder by executing pwd command after you get into your terminal inside your created folder

Assuming create “data” and “name” directory from your home directory, I created a directory hadoop-dir and inside that I have created 2 directories one for name node and other for data node

ensure your data and name directory created and accessed by hadoop, execute the command to change the directory permission by

as well as

i.e. if you $ ls -l your directory you data and name directory should be in this mode “drwxr-xr-x” which means owner has three permissions, and group and other have only read and execute permissions

10.HDFS Format

Before starting your cluster you may need to format your HDFS by running the below command from <Hadoop-Home-Dir>/bin

To copy local file to HDFS execute this command from <Hadoop-Home-Dir>/bin from the terminal

Eg: – If I have a sample.txt file in the path /home/<user>/Example

Then I have to executing the command from<Hadoop-Home-Dir>/bin

$ ./hadoop dfs –copyFromLocal /home/<user>/Example/sample.txt /

This command will copy the local fin into HDFS home directory

12.Browse HDFS through web interface

Starting hadoop cluster is by executing a command from<Hadoop-Home-Dir>/bin

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine

To stop the cluster

to stop all the daemons running on your machine.

To understand more on Getting Started With Hadoop

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode – http://localhost:50070/

If you face problem running your Cluster, especially with data node daemon not starting up

Stop the cluster ($./stop-all.sh)

update the value of namespaceID in your data node eg: <datanode dir>/current/VERSION file to match the value of the current NameNode VERSION file namespaceID

Restart the cluster