Hey there, folks! Today, I’m going to walk you through the process of configuring Hadoop in a Windows operating system. So, grab a cup of coffee and let’s dive right in!
Checking if Java is Installed
Before we get started with Hadoop, it’s important to check if Java is already installed on your machine. Don’t worry, it’s a piece of cake! Open up your command prompt and type in “java” (without quotes), then hit Enter. If you see a message saying “Java is not recognized as an internal or external command,” it means you need to install Java.
To install Java, we’ll need to head over to Oracle’s website. Don’t worry, I’ve got your back! Just fire up your web browser and search for “Oracle Java download.” Click on the official link, and once you’re there, scroll down and select Java 8 or maybe the latest version which for now is Java 20.
Since I’m using a 64-bit operating system, I’ll go ahead and download the appropriate version. If you’re using a 32-bit system, make sure to choose the corresponding option. Once you’ve made your selection, accept the license agreement and hit that download button.
Now, Oracle might ask you to log in. If you already have an account, just provide your credentials and sign in. If not, don’t fret! You can quickly create an account with a few simple steps. Once you’re all set, the download will begin.
Installing Java – Step by Step
Alright, let’s install Java, shall we? Head to your local disk drive C and create a new folder called “Java.” We need this folder for the installation process. Once you’ve done that, double-click on the downloaded Java file to start the installation wizard.
Click “Next” on the first screen, and then “Next” again on the subsequent screen. Here’s where things get interesting! Change the path to Java and set it to the newly created “Java” folder under your C drive. Once you’ve made the change, click “Next” to proceed.
Hit that “Install” button and sit back while Java works its magic. Once the installation is complete, click on “Close” to wrap things up. Great job! Java is now installed on your machine.
Configuring the Environment for Java
But hold on a second! We still need to configure the environment so that Java is detected correctly. Don’t worry, it’s not as complicated as it sounds. Let’s hop back to the search bar and type in “environment variables.” Open up the “Edit environment variables” option.
In the User Variables section, click on “New” to create a new variable. Enter “JAVA_HOME” (all caps) as the variable name and navigate to the Java installation folder, specifically the “jdk/bin” directory. Copy the path and paste it into the variable value field.
Now, let’s tackle the system variable path. Find it, click on “Edit,” and create a new environment variable. Paste the same “jdk/bin” path we used earlier into the value field. Click “OK” to save the changes.
To ensure that everything takes effect, open a new command prompt and type in “java.” Voila! It’s working like a charm. You can even check the Java version just to be sure. Congratulations, you’ve successfully installed and configured Java!
Now that we have Java up and running, it’s time to move on to the main event—installing Hadoop. Exciting, right? Let’s get started!
Head back to your web browser and search for “Apache Hadoop.” Click on the official link to navigate to the Hadoop website. Once you’re there, find the download section and select the binary option. We’ll go with an older version that’s known for its stability.
Click on the corresponding link to initiate the download. Great job so far!
Unzipping the Hadoop File
Before we dive into the Hadoop installation, we need to make sure we have the right tools. We’ll need WinRAR to unzip the downloaded file. If you don’t have it already, go ahead and install it. Once it’s installed, double click on the Hadoop file you just downloaded and extract it to your C drive.
Now, rename the extracted folder as “Hadoop.” Looking good!
Editing Configuration Files
Alright, folks, we’re making progress! It’s time to roll up our sleeves and edit some configuration files. But don’t worry, I’ll guide you through every step of the way.
Navigate to the Hadoop folder, then open up the “etc” directory. Inside that directory, locate the “hadoop-env.cmd” file and give it a right-click. We want to edit this bad boy!
In the configuration section, we need to add a property with a name and a value. But hang on, we’ll need to repeat this process multiple times. So, I suggest copying the following details into a notepad for quick reference:
Property Name: FS_DEFAULTFS Value: hdfs://localhost:9000
Now, let’s head back to the “core-site.xml” file we just opened. Insert the property details we mentioned earlier. In the name section, type in “fs.defaultFS” (capitalizing the FS), and in the value section, enter “hdfs://localhost:9000.” Save the changes and close the file.
Next up, we need to edit the “hdfs-site.xml” file or the “https-site.xml” file—either one will do the trick. Again, we’ll add a property with a name and a value. Ready?
Here are the details:
Property Name: dfs.replication Value: 1
We also need to create two more properties for the name node and data node paths. Hold on tight, we’re almost there!
Creating Name Node and Data Node Folders
Back in the Hadoop folder, create a new folder called “data.” Inside that folder, create two more folders named “name node” and “data node.” These folders will store essential data for Hadoop.
Once you’ve done that, copy the path of the name node folder. Now, let’s head back to the XML file. In the property name section, enter “dfs.namenode.name.dir,” and for the value, paste the path of the name node folder.
Similarly, do the same for the data node. Paste the data node path in the value field, and set the property name as “dfs.datanode.data.dir.”
Make sure to save the changes and close the file. We’re almost there!
Configuring MapReduce and YARN
Alright, folks, we’re in the home stretch! We just need to tweak a couple more files to ensure everything works smoothly.
Go back to the Hadoop folder, open up the “etc” directory, and find the “mapred-site.xml” file. Edit it and add the following property:
Property Name: mapreduce.framework.name Value: yarn
Save and close the file. You’re doing amazing!
Finally, let’s tackle the “yarn-site.xml” file. Add the following property:
Name: yarn.nodemanager.aux-services Value: mapreduce_shuffle
But wait, there’s more! We need to create one more property. Hang in there, we’re almost done!
Property Name: yarn.nodemanager.aux-services.mapreduce_shuffle.class Value: org.apache.hadoop.mapred.ShuffleHandler
Save the changes and close the file. Fantastic job, everyone!
Fixing the Bin Folder
Oh, snap! We’re almost ready to rock and roll, but we need to fix a little hiccup. Trust me, it’s an easy fix!
Head back to your Hadoop folder and delete the “bin” folder. Don’t worry, we’re going to replace it in no time.
Now, here’s what you need to do: click here to download the new “bin” folder.
Once you’ve downloaded it, extract the “bin” folder and place it inside your Hadoop folder. Excellent work, team!
But hold on, we’re not out of the woods just yet. There’s one last thing we need to take care of.
Resolving Missing DLL Files
Uh-oh, it looks like we’ve hit a small roadblock. But fear not, I’ll guide you through it like a seasoned pro!
Run the “winutils.exe” file located in the bin folder. If you encounter a “System error: 1.dll file is missing” message, don’t panic. We can fix this.
Hop back to your web browser and search for the missing DLL file. In my case, it’s “msvcr120.dll.” Download the 64-bit version if you’re using a 64-bit system, or the 32-bit version if you’re on a 32-bit system.
Once you’ve downloaded it, navigate to your system32 folder. To get there, go to “My PC,” then your local disk, followed by the Windows folder. Scroll down until you find the “System32” folder. Inside that folder, paste the DLL file you just downloaded. Voila!
Now, run the “winutils.exe” file again. No more pesky error pop-ups! You’re a rockstar!
Installing Visual Studio Redistributable
Hold on tight, folks, we’re almost there! We just have one final step to tackle.
Go back to your web browser and search for “msvc-170.” Click on the official Microsoft link and find the 64-bit version if that’s what you need. For 32-bit systems, please download the corresponding version.
Once the download is complete, run the installation file. Agree to the terms and conditions, and let it work its magic. Almost done!
Formatting the Name Node
Alright, team, it’s time to format the name node. Don’t worry, it’s a simple process. Open up your trusty command prompt as an administrator and type the following command:
hdfs namenode -format
Once it’s done, you should see a reassuring message saying, “Name node successfully formatted.” Woohoo!
Launching the Hadoop Cluster
Ladies and gentlemen, we’ve arrived at the grand finale—the moment we’ve all been waiting for. It’s time to launch the Hadoop cluster!
In your command prompt, navigate back to the root directory by typing “cd” and hit Enter. Then, type the following commands one by one:
cd hadoop start-dfs.cmd
This will start the name node and data node.
Next, open another command prompt window as an administrator and navigate to the root directory again by typing “cd”. Then, type the following commands:
cd hadoop start-yarn.cmd
This will start the YARN resource manager and node manager.
And there you have it, folks—your Hadoop cluster is up and running! Congratulations on a job well done!
To make sure everything is working smoothly, let’s run a quick test. Open up a web browser and type in “http://localhost:9870” to access the Hadoop web interface. You should see a beautiful dashboard with all the Hadoop details.
To run a sample MapReduce program, head to your command prompt and type the following command:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
Sit back and relax while Hadoop works its magic. Once it’s done, check the output folder to see the results.
And that, my friends, wraps up our step-by-step guide to configuring Hadoop in a Windows operating system. You did an amazing job following along!
Remember, Hadoop is a powerful tool for big data processing, and now you have it up and running on your Windows machine. Explore its capabilities, experiment with different configurations, and have fun unleashing the power of Hadoop!