基于CentOS的Hadoop分布式環(huán)境的搭建開發(fā)
首先,要說明的一點的是,我不想重復發(fā)明輪子。如果想要搭建Hadoop環(huán)境,網(wǎng)上有很多詳細的步驟和命令代碼,我不想再重復記錄。
其次,我要說的是我也是新手,對于Hadoop也不是很熟悉。但是就是想實際搭建好環(huán)境,看看他的廬山真面目,還好,還好,最好看到了。當運行wordcount詞頻統(tǒng)計的時候,實在是感嘆hadoop已經(jīng)把分布式做的如此之好,即使沒有分布式相關經(jīng)驗的人,也只需要做一些配置即可運行分布式集群環(huán)境。
好了,言歸真?zhèn)鳌?/p>
在搭建Hadoop環(huán)境中你要知道的一些事兒:
1.hadoop運行于Linux系統(tǒng)之上,你要安裝Linux操作系統(tǒng)
2.你需要搭建一個運行hadoop的集群,例如局域網(wǎng)內(nèi)能互相訪問的linux系統(tǒng)
3.為了實現(xiàn)集群之間的相互訪問,你需要做到ssh無密鑰登錄
4.hadoop的運行在JVM上的,也就是說你需要安裝Java的JDK,并配置好JAVA_HOME
5.hadoop的各個組件是通過XML來配置的。在官網(wǎng)上下載好hadoop之后解壓縮,修改/etc/hadoop目錄中相應的配置文件
工欲善其事,必先利其器。這里也要說一下,在搭建hadoop環(huán)境中使用到的相關軟件和工具:
1.VirtualBox——畢竟要模擬幾臺linux,條件有限,就在VirtualBox中創(chuàng)建幾臺虛擬機樓
2.CentOS——下載的CentOS7的iso鏡像,加載到VirtualBox中,安裝運行
3.secureCRT——可以SSH遠程訪問linux的軟件
4.WinSCP——實現(xiàn)windows和Linux的通信
5.JDK for linux——Oracle官網(wǎng)上下載,解壓縮之后配置一下即可
6.hadoop2.7.1——可在Apache官網(wǎng)上下載
好了,下面分三個步驟來講解
Linux環(huán)境準備
配置IP
為了實現(xiàn)本機和虛擬機以及虛擬機和虛擬機之間的通信,VirtualBox中設置CentOS的連接模式為Host-Only模式,并且手動設置IP,注意虛擬機的網(wǎng)關和本機中host-only network 的IP地址相同。配置IP完成后還要重啟網(wǎng)絡服務以使得配置有效。這里搭建了三臺Linux,如下圖所示




配置主機名字
對于192.168.56.101設置主機名字hadoop01。并在hosts文件中配置集群的IP和主機名。其余兩個主機的操作與此類似
[root@hadoop01 ~]# cat /etc/sysconfig/network # Created by anaconda NETWORKING = yes HOSTNAME = hadoop01 [root@hadoop01 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.101 hadoop01 192.168.56.102 hadoop02 192.168.56.103 hadoop03
永久關閉防火墻
service iptables stop(1.下次重啟機器后,防火墻又會啟動,故需要永久關閉防火墻的命令;2由于用的是CentOS 7,關閉防火墻的命令如下)
systemctl stop firewalld.service #停止firewall systemctl disable firewalld.service #禁止firewall開機啟動
關閉SeLinux防護系統(tǒng)
改為disabled 。reboot重啟機器,使配置生效
[root@hadoop02 ~]# cat /etc/sysconfig/selinux # This file controls the state of SELinux on the system # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced # permissive - SELinux prints warnings instead of enforcing # disabled - No SELinux policy is loaded SELINUX=disabled # SELINUXTYPE= can take one of three two values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy Only selected processes are protected # mls - Multi Level Security protection SELINUXTYPE=targeted
集群SSH免密碼登錄
首先設置ssh密鑰
ssh-keygen -t rsa
拷貝ssh密鑰到三臺機器
ssh-copy-id 192.168.56.101 <pre name="code" class="plain">ssh-copy-id 192.168.56.102
ssh-copy-id 192.168.56.103
這樣如果hadoop01的機器想要登錄hadoop02,直接輸入ssh hadoop02
<pre name="code" class="plain">ssh hadoop02
配置JDK
這里在/home忠誠創(chuàng)建三個文件夾中
tools——存放工具包
softwares——存放軟件
data——存放數(shù)據(jù)
通過WinSCP將下載好的Linux JDK上傳到hadoop01的/home/tools中
解壓縮JDK到softwares中
<pre name="code" class="plain">tar -zxf jdk-7u76-linux-x64.tar.gz -C /home/softwares
可見JDK的家目錄在/home/softwares/JDK.x.x.x,將該目錄拷貝粘貼到/etc/profile文件中,并且在文件中設置JAVA_HOME
export JAVA_HOME=/home/softwares/jdk0_111 export PATH=$PATH:$JAVA_HOME/bin
保存修改,執(zhí)行source /etc/profile使配置生效
查看Java jdk是否安裝成功:
java -version
可以將當前節(jié)點中設置的文件拷貝到其他節(jié)點
scp -r /home/* root@192.168.56.10X:/home
Hadoop集群安裝
集群的規(guī)劃如下:
101節(jié)點作為HDFS的NameNode ,其余作為DataNode;102作為YARN的ResourceManager,其余作為NodeManager。103作為SecondaryNameNode。分別在101和102節(jié)點啟動JobHistoryServer和WebAppProxyServer
下載hadoop-2.7.3
并將其放在/home/softwares文件夾中。由于hadoop需要JDK的安裝環(huán)境,所以首先配置/etc/hadoop/hadoop-env.sh的JAVA_HOME
(PS:感覺我用的jdk版本過高了)
接下來依次修改hadoop相應組件對應的XML
修改core-site.xml :
指定namenode地址
修改hadoop的緩存目錄
hadoop的垃圾回收機制
<configuration>
<property>
<name>fsdefaultFS</name>
<value>hdfs://101:8020</value>
</property>
<property>
<name>hadooptmpdir</name>
<value>/home/softwares/hadoop-3/data/tmp</value>
</property>
<property>
<name>fstrashinterval</name>
<value>10080</value>
</property>
</configuration>
hdfs-site.xml
設置備份數(shù)目
關閉權限
設置http訪問接口
設置secondary namenode 的IP地址
<configuration>
<property>
<name>dfsreplication</name>
<value>3</value>
</property>
<property>
<name>dfspermissionsenabled</name>
<value>false</value>
</property>
<property>
<name>dfsnamenodehttp-address</name>
<value>101:50070</value>
</property>
<property>
<name>dfsnamenodesecondaryhttp-address</name>
<value>103:50090</value>
</property>
</configuration>
修改mapred-site.xml.template名字為mapred-site.xml
指定mapreduce的框架為yarn,通過yarn來調(diào)度
指定jobhitory
指定jobhitory的web端口
開啟uber模式——這是針對mapreduce的優(yōu)化
<configuration>
<property>
<name>mapreduceframeworkname</name>
<value>yarn</value>
</property>
<property>
<name>mapreducejobhistoryaddress</name>
<value>101:10020</value>
</property>
<property>
<name>mapreducejobhistorywebappaddress</name>
<value>101:19888</value>
</property>
<property>
<name>mapreducejobubertaskenable</name>
<value>true</value>
</property>
</configuration>
修改yarn-site.xml
指定mapreduce為shuffle
指定102節(jié)點為resourcemanager
指定102節(jié)點的安全代理
開啟yarn的日志
指定yarn日志刪除時間
指定nodemanager的內(nèi)存:8G
指定nodemanager的CPU:8核
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarnnodemanageraux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarnresourcemanagerhostname</name>
<value>102</value>
</property>
<property>
<name>yarnweb-proxyaddress</name>
<value>102:8888</value>
</property>
<property>
<name>yarnlog-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarnlog-aggregationretain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarnnodemanagerresourcememory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarnnodemanagerresourcecpu-vcores</name>
<value>8</value>
</property>
</configuration>
配置slaves
指定計算節(jié)點,即運行datanode和nodemanager的節(jié)點
192.168.56.101
192.168.56.102
192.168.56.103
先在namenode節(jié)點格式化,即101節(jié)點上執(zhí)行:
進入到hadoop主目錄: cd /home/softwares/hadoop-3
執(zhí)行bin目錄下的hadoop腳本: bin/hadoop namenode -format
出現(xiàn)successful format才算是執(zhí)行成功(PS,這里是盜用別人的圖,不要介意哈) 
以上配置完成后,將其拷貝到其他的機器
Hadoop環(huán)境測試
進入hadoop主目錄下執(zhí)行相應的腳本文件
jps命令——java Virtual Machine Process Status,顯示運行的java進程
在namenode節(jié)點101機器上開啟hdfs
[root@hadoop01 hadoop-3]# sbin/start-dfssh Java HotSpot(TM) Client VM warning: You have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard The VM will try to fix the stack guard now It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack' 16/11/07 16:49:19 WARN utilNativeCodeLoader: Unable to load native-hadoop library for your platform using builtin-java classes where applicable Starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /home/softwares/hadoop-3/logs/hadoop-root-namenode-hadoopout 102: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 103: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 101: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout Starting secondary namenodes [hadoop03] hadoop03: starting secondarynamenode, logging to /home/softwares/hadoop-3/logs/hadoop-root-secondarynamenode-hadoopout
此時101節(jié)點上執(zhí)行jps,可以看到namenode和datanode已經(jīng)啟動
[root@hadoop01 hadoop-3]# jps 7826 Jps 7270 DataNode 7052 NameNode
在102和103節(jié)點執(zhí)行jps,則可以看到datanode已經(jīng)啟動
[root@hadoop02 bin]# jps 4260 DataNode 4488 Jps [root@hadoop03 ~]# jps 6436 SecondaryNameNode 6750 Jps 6191 DataNode
啟動yarn
在102節(jié)點執(zhí)行
[root@hadoop02 hadoop-3]# sbin/start-yarnsh starting yarn daemons starting resourcemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-resourcemanager-hadoopout 101: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 103: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 102: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout
jps查看各節(jié)點:
[root@hadoop02 hadoop-3]# jps 4641 ResourceManager 4260 DataNode 4765 NodeManager 5165 Jps [root@hadoop01 hadoop-3]# jps 7270 DataNode 8375 Jps 7976 NodeManager 7052 NameNode [root@hadoop03 ~]# jps 6915 NodeManager 6436 SecondaryNameNode 7287 Jps 6191 DataNode
分別啟動相應節(jié)點的jobhistory和防護進程
[root@hadoop01 hadoop-3]# sbin/mr-jobhistory-daemonsh start historyserver starting historyserver, logging to /home/softwares/hadoop-3/logs/mapred-root-historyserver-hadoopout [root@hadoop01 hadoop-3]# jps 8624 Jps 7270 DataNode 7976 NodeManager 8553 JobHistoryServer 7052 NameNode [root@hadoop02 hadoop-3]# sbin/yarn-daemonsh start proxyserver starting proxyserver, logging to /home/softwares/hadoop-3/logs/yarn-root-proxyserver-hadoopout [root@hadoop02 hadoop-3]# jps 4641 ResourceManager 4260 DataNode 5367 WebAppProxyServer 5402 Jps 4765 NodeManager
在hadoop01節(jié)點,即101節(jié)點上,通過瀏覽器查看節(jié)點狀況 

hdfs上傳文件
[root@hadoop01 hadoop-3]# bin/hdfs dfs -put /etc/profile /profile
運行wordcount程序
[root@hadoop01 hadoop-3]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-jar wordcount /profile /fll_out
Java HotSpot(TM) Client VM warning: You have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard The VM will try to fix the stack guard now
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'
16/11/07 17:17:10 WARN utilNativeCodeLoader: Unable to load native-hadoop library for your platform using builtin-java classes where applicable
16/11/07 17:17:12 INFO clientRMProxy: Connecting to ResourceManager at /102:8032
16/11/07 17:17:18 INFO inputFileInputFormat: Total input paths to process : 1
16/11/07 17:17:19 INFO mapreduceJobSubmitter: number of splits:1
16/11/07 17:17:19 INFO mapreduceJobSubmitter: Submitting tokens for job: job_1478509135878_0001
16/11/07 17:17:20 INFO implYarnClientImpl: Submitted application application_1478509135878_0001
16/11/07 17:17:20 INFO mapreduceJob: The url to track the job: http://102:8888/proxy/application_1478509135878_0001/
16/11/07 17:17:20 INFO mapreduceJob: Running job: job_1478509135878_0001
16/11/07 17:18:34 INFO mapreduceJob: Job job_1478509135878_0001 running in uber mode : true
16/11/07 17:18:35 INFO mapreduceJob: map 0% reduce 0%
16/11/07 17:18:43 INFO mapreduceJob: map 100% reduce 0%
16/11/07 17:18:50 INFO mapreduceJob: map 100% reduce 100%
16/11/07 17:18:55 INFO mapreduceJob: Job job_1478509135878_0001 completed successfully
16/11/07 17:18:59 INFO mapreduceJob: Counters: 52
File System Counters
FILE: Number of bytes read=4264
FILE: Number of bytes written=6412
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=3940
HDFS: Number of bytes written=261673
HDFS: Number of read operations=35
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=8246
Total time spent by all reduces in occupied slots (ms)=7538
TOTAL_LAUNCHED_UBERTASKS=2
NUM_UBER_SUBMAPS=1
NUM_UBER_SUBREDUCES=1
Total time spent by all map tasks (ms)=8246
Total time spent by all reduce tasks (ms)=7538
Total vcore-milliseconds taken by all map tasks=8246
Total vcore-milliseconds taken by all reduce tasks=7538
Total megabyte-milliseconds taken by all map tasks=8443904
Total megabyte-milliseconds taken by all reduce tasks=7718912
Map-Reduce Framework
Map input records=78
Map output records=256
Map output bytes=2605
Map output materialized bytes=2116
Input split bytes=99
Combine input records=256
Combine output records=156
Reduce input groups=156
Reduce shuffle bytes=2116
Reduce input records=156
Reduce output records=156
Spilled Records=312
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=870
CPU time spent (ms)=1970
Physical memory (bytes) snapshot=243326976
Virtual memory (bytes) snapshot=2666557440
Total committed heap usage (bytes)=256876544
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1829
File Output Format Counters
Bytes Written=1487
瀏覽器中通過YARN查看運行狀態(tài) 
查看最后的詞頻統(tǒng)計結(jié)果
瀏覽器中查看hdfs的文件系統(tǒng)
[root@hadoop01 hadoop-3]# bin/hdfs dfs -cat /fll_out/part-r-00000
Java HotSpot(TM) Client VM warning: You have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard The VM will try to fix the stack guard now
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'
16/11/07 17:29:17 WARN utilNativeCodeLoader: Unable to load native-hadoop library for your platform using builtin-java classes where applicable
!= 1
"$-" 1
"$2" 1
"$EUID" 2
"$HISTCONTROL" 1
"$i" 3
"${-#*i}" 1
"0" 1
":${PATH}:" 1
"`id 2
"after" 1
"ignorespace" 1
# 13
$UID 1
&& 1
() 1
*) 1
*:"$1":*) 1
-f 1
-gn`" 1
-gt 1
-r 1
-ru` 1
-u` 1
-un`" 2
-x 1
-z 1
2
/etc/bashrc 1
/etc/profile 1
/etc/profiled/ 1
/etc/profiled/*sh 1
/usr/bin/id 1
/usr/local/sbin 2
/usr/sbin 2
/usr/share/doc/setup-*/uidgid 1
002 1
022 1
199 1
200 1
2>/dev/null` 1
; 3
;; 1
= 4
>/dev/null 1
By 1
Current 1
EUID=`id 1
Functions 1
HISTCONTROL 1
HISTCONTROL=ignoreboth 1
HISTCONTROL=ignoredups 1
HISTSIZE 1
HISTSIZE=1000 1
HOSTNAME 1
HOSTNAME=`/usr/bin/hostname 1
It's 2
JAVA_HOME=/home/softwares/jdk0_111 1
LOGNAME 1
LOGNAME=$USER 1
MAIL 1
MAIL="/var/spool/mail/$USER" 1
NOT 1
PATH 1
PATH=$1:$PATH 1
PATH=$PATH:$1 1
PATH=$PATH:$JAVA_HOME/bin 1
Path 1
System 1
This 1
UID=`id 1
USER 1
USER="`id 1
You 1
[ 9
] 3
]; 6
a 2
after 2
aliases 1
and 2
are 1
as 1
better 1
case 1
change 1
changes 1
check 1
could 1
create 1
custom 1
customsh 1
default, 1
do 1
doing 1
done 1
else 5
environment 1
environment, 1
esac 1
export 5
fi 8
file 2
for 5
future 1
get 1
go 1
good 1
i 2
idea 1
if 8
in 6
is 1
it 1
know 1
ksh 1
login 2
make 1
manipulation 1
merging 1
much 1
need 1
pathmunge 6
prevent 1
programs, 1
reservation 1
reserved 1
script 1
set 1
sets 1
setup 1
shell 2
startup 1
system 1
the 1
then 8
this 2
threshold 1
to 5
uid/gids 1
uidgid 1
umask 3
unless 1
unset 2
updates 1
validity 1
want 1
we 1
what 1
wide 1
will 1
workaround 1
you 2
your 1
{ 1
} 1
這就代表hadoop集群正確
以上就是本文的全部內(nèi)容,希望對大家的學習有所幫助,也希望大家多多支持腳本之家。
相關文章
紅帽RHEL8和7的區(qū)別對比分享(Centos8與7參照redhat)
這篇文章主要介紹了紅帽RHEL8和7有什么區(qū)別(Centos8與7參照redhat),包括紅帽RHEL8和RHEL7功能區(qū)別對比和RHEL8額外新功能新特性,對紅帽RHEL8和7相關知識感興趣的朋友跟隨小編一起看看吧2023-01-01
smarty實現(xiàn)PHP靜態(tài)化的兩種方法分享
用smarty實現(xiàn)純靜態(tài)化的文件發(fā)布有兩種方法,也就是純HTML文件生成的方法,包括以下兩種方法,需要的朋友可以參考下2012-02-02
Linux中date命令轉(zhuǎn)換日期提示date: illegal time format問題解決
date命令是顯示或設置系統(tǒng)時間與日期,最近在使用中發(fā)現(xiàn)了一個問題,所以下面這篇文章主要給大家介紹了關于Linux中date命令轉(zhuǎn)換日期提示date: illegal time format錯誤的解決方法,需要的朋友可以參考借鑒,下面隨著小編來一起看看吧。2017-12-12
apache You don''t have permission to access /test.php on thi
這篇文章主要介紹了apache You don't have permission to access /test.php on this server解決方法,需要的朋友可以參考下2015-04-04
linux修改root密碼和linux忘記root密碼后找回密碼的方法
這篇文章主要介紹了linux修改root密碼和linux忘記root密碼后找回密碼的方法,需要的朋友可以參考下2014-01-01
在CentOS7系統(tǒng)上編譯安裝MySQL 5.7.13步驟詳解
本篇文章主要介紹了在CentOS7系統(tǒng)上編譯安裝MySQL 5.7.13步驟詳解,具有一定的參考價值,有興趣的可以了解一下。2017-01-01

