Hadoop自动化安装及单节点方式运行

本文尝试使用shell脚本来自动化安装配置hadoop。使用的操作系统为centos，hadoop版本为?1.x，jdk版本?1.7，其他版本未测试，可能有未知bug。 hadoop安装脚本 hadoop安装分为3步，首先安装jdk，然后安装hadoop，接着配置ssh免密码登陆(非必须)。[1] #!/bin/ba
本文尝试使用shell脚本来自动化安装配置hadoop。使用的操作系统为centos，hadoop版本为?1.x，jdk版本?1.7，其他版本未测试，可能有未知bug。
hadoop安装脚本hadoop安装分为3步，首先安装jdk，然后安装hadoop，接着配置ssh免密码登陆(非必须)。[1]
#!/bin/bash# usage: hadoop自动配置脚本# history: # 20140425 annhe 基本功能#hadoop版本hadoop_version=1.2.1#jdk版本，oracle官方无直链下载，请自备rpm包并设定版本号jdk_vesion=7u51#hadoop下载镜像，默认为北理(bit)mirrors=mirror.bit.edu.cn#操作系统版本os=`uname -a |awk '{print $13}'`# check if user is rootif [ $(id -u) != 0 ]; then printf error: you must be root to run this script!\n exit 1fi# 检查是否是centoscat /etc/issue|grep centos && r=0 || r=1if [ $r -eq 1 ]; then echo this script can only run on centos! exit 1fi#软件包hadoop_file=hadoop-$hadoop_version-1.$os.rpmif [ $osx = x86_64x ]; then jdk_file=jdk-$jdk_vesion-linux-x64.rpmelse jdk_file=jdk-$jdk_vesion-linux-i586.rpmfifunction install (){ #卸载已安装版本 rpm -qa |grep hadoop rpm -e hadoop rpm -qa | grep jdk rpm -e jdk #恢复/etc/profile备份文件 mv /etc/profile.bak /etc/profile #准备软件包 if [ ! -f $hadoop_file ]; then wget http://$mirrors/apache/hadoop/common/stable1/$hadoop_file && r=0 || r=1 [ $r -eq 1 ] && { echo download error, please check your mirrors or check your network....exit; exit 1; } fi [ ! -f $jdk_file ] && { echo $jdk_file not found! please download yourself....exit; exit 1; } #开始安装 rpm -ivh $jdk_file && r=0 || r=1 if [ $r -eq 1 ]; then echo $jdk_file install failed, please verify your rpm file....exit exit 1 fi rpm -ivh $hadoop_file && r=0 || r=1 if [ $r -eq 1 ]; then echo $hadoop_file install failed, please verify your rpm file....exit exit 1 fi #备份/etc/profile cp /etc/profile /etc/profile.bak #配置java环境变量 cat >> /etc/profile > ~/.ssh/authorized_keys chmod 644 ~/.ssh/authorized_keys}install 2>&1 | tee -a hadoop_install.logsshlogin 2>&1 | tee -a hadoop_install.log#修改hadoop_client_opts后需要重启 shutdown -r now
单节点运行自带示例默认情况下，hadoop被配置成以非分布式模式运行的一个独立java进程。这对调试非常有帮助。新建测试文本
[root@linux hadoop]# echo hello world >input/hello.txt[root@linux hadoop]# echo hello hadoop >input/hadoop.txt
运行wordcount
[root@linux hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input output14/04/26 02:56:23 info util.nativecodeloader: loaded the native-hadoop library14/04/26 02:56:23 info input.fileinputformat: total input paths to process : 214/04/26 02:56:24 warn snappy.loadsnappy: snappy native library not loaded14/04/26 02:56:24 info mapred.jobclient: running job: job_local275273933_000114/04/26 02:56:24 info mapred.localjobrunner: waiting for map tasks14/04/26 02:56:24 info mapred.localjobrunner: starting task: attempt_local275273933_0001_m_000000_014/04/26 02:56:25 info util.processtree: setsid exited with exit code 014/04/26 02:56:25 info mapred.task: using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@7e86fe3a14/04/26 02:56:25 info mapred.maptask: processing split: file:/root/hadoop/input/hadoop.txt:0+1314/04/26 02:56:25 info mapred.maptask: io.sort.mb = 10014/04/26 02:56:25 info mapred.maptask: data buffer = 79691776/9961472014/04/26 02:56:25 info mapred.maptask: record buffer = 262144/32768014/04/26 02:56:25 info mapred.maptask: starting flush of map output14/04/26 02:56:25 info mapred.maptask: finished spill 014/04/26 02:56:25 info mapred.task: task:attempt_local275273933_0001_m_000000_0 is done. and is in the process of commiting14/04/26 02:56:25 info mapred.localjobrunner:14/04/26 02:56:25 info mapred.task: task 'attempt_local275273933_0001_m_000000_0' done.14/04/26 02:56:25 info mapred.localjobrunner: finishing task: attempt_local275273933_0001_m_000000_014/04/26 02:56:25 info mapred.localjobrunner: starting task: attempt_local275273933_0001_m_000001_014/04/26 02:56:25 info mapred.task: using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@16ed889d14/04/26 02:56:25 info mapred.maptask: processing split: file:/root/hadoop/input/hello.txt:0+1214/04/26 02:56:25 info mapred.maptask: io.sort.mb = 10014/04/26 02:56:25 info mapred.maptask: data buffer = 79691776/9961472014/04/26 02:56:25 info mapred.maptask: record buffer = 262144/32768014/04/26 02:56:25 info mapred.maptask: starting flush of map output14/04/26 02:56:25 info mapred.maptask: finished spill 014/04/26 02:56:25 info mapred.task: task:attempt_local275273933_0001_m_000001_0 is done. and is in the process of commiting14/04/26 02:56:25 info mapred.localjobrunner:14/04/26 02:56:25 info mapred.task: task 'attempt_local275273933_0001_m_000001_0' done.14/04/26 02:56:25 info mapred.localjobrunner: finishing task: attempt_local275273933_0001_m_000001_014/04/26 02:56:25 info mapred.localjobrunner: map task executor complete.14/04/26 02:56:25 info mapred.task: using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@42701c5714/04/26 02:56:25 info mapred.localjobrunner:14/04/26 02:56:25 info mapred.merger: merging 2 sorted segments14/04/26 02:56:25 info mapred.merger: down to the last merge-pass, with 2 segments left of total size: 53 bytes14/04/26 02:56:25 info mapred.localjobrunner:14/04/26 02:56:25 info mapred.task: task:attempt_local275273933_0001_r_000000_0 is done. and is in the process of commiting14/04/26 02:56:25 info mapred.localjobrunner:14/04/26 02:56:25 info mapred.task: task attempt_local275273933_0001_r_000000_0 is allowed to commit now14/04/26 02:56:25 info output.fileoutputcommitter: saved output of task 'attempt_local275273933_0001_r_000000_0' to output14/04/26 02:56:25 info mapred.localjobrunner: reduce > reduce14/04/26 02:56:25 info mapred.task: task 'attempt_local275273933_0001_r_000000_0' done.14/04/26 02:56:25 info mapred.jobclient: map 100% reduce 100%14/04/26 02:56:25 info mapred.jobclient: job complete: job_local275273933_000114/04/26 02:56:25 info mapred.jobclient: counters: 2014/04/26 02:56:25 info mapred.jobclient: file output format counters14/04/26 02:56:25 info mapred.jobclient: bytes written=3714/04/26 02:56:25 info mapred.jobclient: filesystemcounters14/04/26 02:56:25 info mapred.jobclient: file_bytes_read=42952614/04/26 02:56:25 info mapred.jobclient: file_bytes_written=58646314/04/26 02:56:25 info mapred.jobclient: file input format counters14/04/26 02:56:25 info mapred.jobclient: bytes read=2514/04/26 02:56:25 info mapred.jobclient: map-reduce framework14/04/26 02:56:25 info mapred.jobclient: reduce input groups=314/04/26 02:56:25 info mapred.jobclient: map output materialized bytes=6114/04/26 02:56:25 info mapred.jobclient: combine output records=414/04/26 02:56:25 info mapred.jobclient: map input records=214/04/26 02:56:25 info mapred.jobclient: reduce shuffle bytes=014/04/26 02:56:25 info mapred.jobclient: physical memory (bytes) snapshot=014/04/26 02:56:25 info mapred.jobclient: reduce output records=314/04/26 02:56:25 info mapred.jobclient: spilled records=814/04/26 02:56:25 info mapred.jobclient: map output bytes=4114/04/26 02:56:25 info mapred.jobclient: cpu time spent (ms)=014/04/26 02:56:25 info mapred.jobclient: total committed heap usage (bytes)=48091545614/04/26 02:56:25 info mapred.jobclient: virtual memory (bytes) snapshot=014/04/26 02:56:25 info mapred.jobclient: combine input records=414/04/26 02:56:25 info mapred.jobclient: map output records=414/04/26 02:56:25 info mapred.jobclient: split_raw_bytes=19714/04/26 02:56:25 info mapred.jobclient: reduce input records=
结果
[root@linux hadoop]# cat output/*hadoop 1hello 2world 1
?运行自己编写的wordcountpackage net.annhe.wordcount;import java.io.ioexception;import java.util.*;import org.apache.hadoop.fs.path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.*;import org.apache.hadoop.mapreduce.lib.input.*;import org.apache.hadoop.mapreduce.lib.output.*;import org.apache.hadoop.util.*;public class wordcount extends configured implements tool { public static class map extends mapper { private final static intwritable one = new intwritable(1); private text word = new text(); public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { string line = value.tostring(); stringtokenizer tokenizer = new stringtokenizer(line); while (tokenizer.hasmoretokens()) { word.set(tokenizer.nexttoken()); context.write(word,one); } } } public static class reduce extends reducer { public void reduce (text key, iterable values, context context) throws ioexception, interruptedexception { int sum=0; for(intwritable val : values) { sum += val.get(); } context.write(key, new intwritable(sum)); } } public int run(string[] args) throws exception { job job = new job(getconf()); job.setjarbyclass(wordcount.class); job.setjobname(wordcount); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(textinputformat.class); job.setoutputformatclass(textoutputformat.class); fileinputformat.setinputpaths(job, new path(args[0])); fileoutputformat.setoutputpath(job, new path(args[1])); boolean success = job.waitforcompletion(true); return success ? 0 : 1; } public static void main(string[] args) throws exception { int ret = toolrunner.run(new wordcount(),args); system.exit(ret); }}
编译
javac -classpath /usr/share/hadoop/hadoop-core-1.2.1.jar -d . wordcount.java
打包
jar -vcf wordcount.jar -c demo/ .
运行
hadoop jar wordcount.jar net.annhe.wordcount.wordcount input/ out
结果
[root@linux hadoop]# cat out/*hadoop 1hello 2world 1
?遇到的问题1. 内存不足分给虚拟机的内存才180m，运行实例程序时报错：
java.lang.exception: java.lang.outofmemoryerror: java heap space
解决方案：
增加虚拟机内存，并编辑/etc/hadoop/hadoop-env.sh，修改：
export hadoop_client_opts=-xmx512m $hadoop_client_opts #改成512m
原来启动jvm时配置的最大内存是128m，当运行hadoop的一些自带的实例会报内存溢出，其实这里是可以修改内存大小
如果不需要也不必修改。[2]
?2. 带有包名的类的引用带有包名的类要按照包层次调用类。如上面的 net.annhe.wordcount.wordcount [3]
3. 带有包名的类的编译需要打包编译，加-d选项。
java的类文件是应该放入包中的，如package abc;
public class ls {...} 那么这个abc就是就是类ls的包，那么编译的时候就应该创建相应的abc包，具体就是用javac的一个参数，就是这个-d来生成这个类文件的包，例如上面的类在编译时应该写javac -d . ls.java注意javac和-d,-d和后面的．,.和后面的ls.java中间都有空格[4]
参考资料[1]. 陆嘉桓. hadoop实战. 第二版. 机械工业出版社
[2]. oschina博客:http://my.oschina.net/mynote/blog/93340
[3]. csdn博客：http://blog.csdn.net/xw13106209/article/details/6861855
[4]. 百度知道:http://zhidao.baidu.com/link?url=nd1bwmygb_5a05jntd9vgznwgtmjmckf1v6dhvnm1efnuhl6kbqyvrewtcumy7kyp5f66r2bumcifcnpqnydd_
本文遵从cc版权协定，转载请以链接形式注明出处。
本文链接地址: http://www.annhe.net/article-2672.html

Hadoop自动化安装及单节点方式运行

推荐信息