网上有一些都是在linux下使用安装eclipse来进行hadoop应用开发,但是大部分java程序员对linux系统不是那么熟悉,所以需要在windows下开发hadoop程序,所以经过试验,总结了下如何在windows下使用eclipse来开发hadoop程序代码。 1、 需要下载hadoop的专门插件
网上有一些都是在linux下使用安装eclipse来进行hadoop应用开发,但是大部分java程序员对linux系统不是那么熟悉,所以需要在windows下开发hadoop程序,所以经过试验,总结了下如何在windows下使用eclipse来开发hadoop程序代码。
1、 需要下载hadoop的专门插件jar包
hadoop版本为2.3.0,hadoop集群搭建在centos6x上面,插件包下载地址为:http://download.csdn.net/detail/mchdba/8267181,jar包名字为hadoop-eclipse-plugin-2.3.0,可以适用于hadoop2x系列软件版本。
2、 把插件包放到eclipse/plugins目录下
为了以后方便,我这里把尽可能多的jar包都放进来了,如下图所示:
3、重启eclipse,配置hadoop installation directory
如果插件安装成功,打开windows—preferences后,在窗口左侧会有hadoop map/reduce选项,点击此选项,在窗口右侧设置hadoop安装路径。
4、配置map/reduce locations
打开windows-->open perspective-->other
选择map/reduce,点击ok,在右下方看到有个map/reduce locations的图标,如下图所示:
点击map/reduce location选项卡,点击右边小象图标,打开hadoop location配置窗口:
输入location name,任意名称即可.配置map/reduce master和dfs mastrer,host和port配置成与core-site.xml的设置一致即可。
去找core-site.xml配置:
fs.default.namehdfs://name01:9000
在界面配置如下:
点击finish按钮,关闭窗口。点击左侧的dfslocations—>myhadoop(上一步配置的location name),如能看到user,表示安装成功,但是进去看到报错信息:error: permission denied: user=root,access=read_execute,inode=/tmp;hadoop:supergroup:drwx---------,如下图所示:
应该是权限问题:把/tmp/目录下面所有的关于hadoop的文件夹设置成hadoop用户所有然后分配授予777权限。
cd /tmp/
chmod 777 /tmp/
chown -r hadoop.hadoop /tmp/hsperfdata_root
之后重新连接打开dfs locations就显示正常了。
map/reduce master (此处为hadoop集群的map/reduce地址,应该和mapred-site.xml中的mapred.job.tracker设置相同)
(1):点击报错:
an internal error occurred during: connecting to dfs hadoopname01.
java.net.unknownhostexception: name01
直接在hostname那一栏里面设置ip地址为:192.168.52.128,即可,这样就正常打开了,如下图所示:
5、新建wordcount项目
file—>project,选择map/reduce project,输入项目名称wordcount等。
在wordcount项目里新建class,名称为wordcount,报错代码如下:invalid hadoop runtime specified; please click 'configure hadoop install directory' or fill in library location input field,报错原因是目录选择不对,不能选择在跟目录e:\hadoop下,换成e:\u\hadoop\就可以了,如下所示:
一路下一步过去,点击finished按钮,完成工程创建,eclipse控制台下面出现如下信息:
14-12-9 下午04时03分10秒: eclipse is running in a jre, but a jdk is required
some maven plugins may not work when importing projects or updating source folders.
14-12-9 下午04时03分13秒: refreshing [/wordcount/pom.xml]
14-12-9 下午04时03分14秒: refreshing [/wordcount/pom.xml]
14-12-9 下午04时03分14秒: refreshing [/wordcount/pom.xml]
14-12-9 下午04时03分14秒: updating index central|http://repo1.maven.org/maven2
14-12-9 下午04时04分10秒: updated index for central|http://repo1.maven.org/maven2
6, lib包导入:
/hadoop-2.3.0/share/hadoop/common下所有jar包,及里面的lib目录下所有jar包,
7,eclipse直接提交mapreduce任务所需要环境配置代码如下所示:
package wc;import java.io.ioexception;import java.util.stringtokenizer;import org.apache.hadoop.conf.configuration;import org.apache.hadoop.fs.path;import org.apache.hadoop.io.intwritable;import org.apache.hadoop.io.text;import org.apache.hadoop.mapreduce.job;import org.apache.hadoop.mapreduce.mapper;import org.apache.hadoop.mapreduce.reducer;import org.apache.hadoop.mapreduce.lib.input.fileinputformat;import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import org.apache.hadoop.util.genericoptionsparser;public class w2 {public static class tokenizermapper extendsmapper {private final static intwritable one = new intwritable(1);private text word = new text();public void map(object key, text value, context context)throws ioexception, interruptedexception {stringtokenizer itr = new stringtokenizer(value.tostring());while (itr.hasmoretokens()) {word.set(itr.nexttoken());context.write(word, one);}}}public static class intsumreducer extendsreducer {private intwritable result = new intwritable();public void reduce(text key, iterable values,context context) throws ioexception, interruptedexception {int sum = 0;for (intwritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}public static void main(string[] args) throws exception {configuration conf = new configuration(); system.setproperty(\
8、运行8.1、在hdfs上创建目录input
[hadoop@name01 hadoop-2.3.0]$ hadoop fs -ls /
[hadoop@name01 hadoop-2.3.0]$ hadoop fs -mkdir input
mkdir: `input': no such file or directory
[hadoop@name01 hadoop-2.3.0]$ ps:fs需要全目录的方式来创建文件夹
如果apache hadoop版本是0.x 或者1.x,
bin/hadoop hdfs fs -mkdir -p /in
bin/hadoop hdfs fs -put /home/du/input /in
如果apache hadoop版本是2.x.
bin/hdfs dfs -mkdir -p /in
bin/hdfs dfs -put /home/du/input /in
如果是发行版的hadoop,比如cloudera cdh,ibm bi,hortonworks hdp 则第一种命令即可。要注意创建目录的全路径。另外hdfs的根目录是 /
2、拷贝本地readme.txt到hdfs的input里
[hadoop@name01 hadoop-2.3.0]$ find . -name readme.txt
./share/doc/hadoop/common/readme.txt
[hadoop@name01 ~]$ hadoop fs -copyfromlocal ./src/hadoop-2.3.0/share/doc/hadoop/common/readme.txt /data/input
[hadoop@name01 ~]$
[hadoop@name01 ~]$ hadoop fs -ls /
3,运行hadoop结束后,查看输出结果
2014-12-16 15:34:01,303 info [main] configuration.deprecation (configuration.java:warnonceifdeprecated(996)) - session.id is deprecated. instead, use dfs.metrics.session-id2014-12-16 15:34:01,309 info [main] jvm.jvmmetrics (jvmmetrics.java:init(76)) - initializing jvm metrics with processname=jobtracker, sessionid=2014-12-16 15:34:02,047 info [main] input.fileinputformat (fileinputformat.java:liststatus(287)) - total input paths to process : 12014-12-16 15:34:02,120 info [main] mapreduce.jobsubmitter (jobsubmitter.java:submitjobinternal(396)) - number of splits:12014-12-16 15:34:02,323 info [main] mapreduce.jobsubmitter (jobsubmitter.java:printtokens(479)) - submitting tokens for job: job_local1764589720_00012014-12-16 15:34:02,367 warn [main] conf.configuration (configuration.java:loadproperty(2345)) - file:/tmp/hadoop-hadoop/mapred/staging/hadoop1764589720/.staging/job_local1764589720_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; ignoring.2014-12-16 15:34:02,368 warn [main] conf.configuration (configuration.java:loadproperty(2345)) - file:/tmp/hadoop-hadoop/mapred/staging/hadoop1764589720/.staging/job_local1764589720_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; ignoring.2014-12-16 15:34:02,682 warn [main] conf.configuration (configuration.java:loadproperty(2345)) - file:/tmp/hadoop-hadoop/mapred/local/localrunner/hadoop/job_local1764589720_0001/job_local1764589720_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; ignoring.2014-12-16 15:34:02,682 warn [main] conf.configuration (configuration.java:loadproperty(2345)) - file:/tmp/hadoop-hadoop/mapred/local/localrunner/hadoop/job_local1764589720_0001/job_local1764589720_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; ignoring.2014-12-16 15:34:02,703 info [main] mapreduce.job (job.java:submit(1289)) - the url to track the job: http://localhost:8080/2014-12-16 15:34:02,704 info [main] mapreduce.job (job.java:monitorandprintjob(1334)) - running job: job_local1764589720_00012014-12-16 15:34:02,707 info [thread-4] mapred.localjobrunner (localjobrunner.java:createoutputcommitter(471)) - outputcommitter set in config null2014-12-16 15:34:02,719 info [thread-4] mapred.localjobrunner (localjobrunner.java:createoutputcommitter(489)) - outputcommitter is org.apache.hadoop.mapreduce.lib.output.fileoutputcommitter2014-12-16 15:34:02,853 info [thread-4] mapred.localjobrunner (localjobrunner.java:runtasks(448)) - waiting for map tasks2014-12-16 15:34:02,857 info [localjobrunner map task executor #0] mapred.localjobrunner (localjobrunner.java:run(224)) - starting task: attempt_local1764589720_0001_m_000000_02014-12-16 15:34:02,919 info [localjobrunner map task executor #0] util.procfsbasedprocesstree (procfsbasedprocesstree.java:isavailable(129)) - procfsbasedprocesstree currently is supported only on linux.2014-12-16 15:34:03,281 info [localjobrunner map task executor #0] mapred.task (task.java:initialize(581)) - using resourcecalculatorprocesstree : org.apache.hadoop.yarn.util.windowsbasedprocesstree@2e1022ec2014-12-16 15:34:03,287 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:runnewmapper(733)) - processing split: hdfs://192.168.52.128:9000/data/input/readme.txt:0+13662014-12-16 15:34:03,304 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:createsortingcollector(388)) - map output collector class = org.apache.hadoop.mapred.maptask$mapoutputbuffer2014-12-16 15:34:03,340 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:setequator(1181)) - (equator) 0 kvi 26214396(104857584)2014-12-16 15:34:03,341 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:init(975)) - mapreduce.task.io.sort.mb: 1002014-12-16 15:34:03,341 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:init(976)) - soft limit at 838860802014-12-16 15:34:03,341 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:init(977)) - bufstart = 0; bufvoid = 1048576002014-12-16 15:34:03,341 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:init(978)) - kvstart = 26214396; length = 65536002014-12-16 15:34:03,708 info [main] mapreduce.job (job.java:monitorandprintjob(1355)) - job job_local1764589720_0001 running in uber mode : false2014-12-16 15:34:03,710 info [main] mapreduce.job (job.java:monitorandprintjob(1362)) - map 0% reduce 0%2014-12-16 15:34:04,121 info [localjobrunner map task executor #0] mapred.localjobrunner (localjobrunner.java:statusupdate(591)) -2014-12-16 15:34:04,128 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:flush(1435)) - starting flush of map output2014-12-16 15:34:04,128 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:flush(1453)) - spilling map output2014-12-16 15:34:04,128 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:flush(1454)) - bufstart = 0; bufend = 2055; bufvoid = 1048576002014-12-16 15:34:04,128 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:flush(1456)) - kvstart = 26214396(104857584); kvend = 26213684(104854736); length = 713/65536002014-12-16 15:34:04,179 info [localjobrunner map task executor #0] mapred.maptask (maptask.java:sortandspill(1639)) - finished spill 02014-12-16 15:34:04,194 info [localjobrunner map task executor #0] mapred.task (task.java:done(995)) - task:attempt_local1764589720_0001_m_000000_0 is done. and is in the process of committing2014-12-16 15:34:04,207 info [localjobrunner map task executor #0] mapred.localjobrunner (localjobrunner.java:statusupdate(591)) - map2014-12-16 15:34:04,208 info [localjobrunner map task executor #0] mapred.task (task.java:senddone(1115)) - task \'attempt_local1764589720_0001_m_000000_0\' done.2014-12-16 15:34:04,208 info [localjobrunner map task executor #0] mapred.localjobrunner (localjobrunner.java:run(249)) - finishing task: attempt_local1764589720_0001_m_000000_02014-12-16 15:34:04,208 info [thread-4] mapred.localjobrunner (localjobrunner.java:runtasks(456)) - map task executor complete.2014-12-16 15:34:04,211 info [thread-4] mapred.localjobrunner (localjobrunner.java:runtasks(448)) - waiting for reduce tasks2014-12-16 15:34:04,211 info [pool-6-thread-1] mapred.localjobrunner (localjobrunner.java:run(302)) - starting task: attempt_local1764589720_0001_r_000000_02014-12-16 15:34:04,221 info [pool-6-thread-1] util.procfsbasedprocesstree (procfsbasedprocesstree.java:isavailable(129)) - procfsbasedprocesstree currently is supported only on linux.2014-12-16 15:34:04,478 info [pool-6-thread-1] mapred.task (task.java:initialize(581)) - using resourcecalculatorprocesstree : org.apache.hadoop.yarn.util.windowsbasedprocesstree@361546152014-12-16 15:34:04,483 info [pool-6-thread-1] mapred.reducetask (reducetask.java:run(362)) - using shuffleconsumerplugin: org.apache.hadoop.mapreduce.task.reduce.shuffle@e2b02a32014-12-16 15:34:04,500 info [pool-6-thread-1] reduce.mergemanagerimpl (mergemanagerimpl.java:(193)) - mergermanager: memorylimit=949983616, maxsingleshufflelimit=237495904, mergethreshold=626989184, iosortfactor=10, memtomemmergeoutputsthreshold=102014-12-16 15:34:04,503 info [eventfetcher for fetching map completion events] reduce.eventfetcher (eventfetcher.java:run(61)) - attempt_local1764589720_0001_r_000000_0 thread started: eventfetcher for fetching map completion events2014-12-16 15:34:04,543 info [localfetcher#1] reduce.localfetcher (localfetcher.java:copymapoutput(140)) - localfetcher#1 about to shuffle output ofmap attempt_local1764589720_0001_m_000000_0 decomp: 1832 len: 1836 to memory2014-12-16 15:34:04,548 info [localfetcher#1] reduce.inmemorymapoutput (inmemorymapoutput.java:shuffle(100)) - read 1832 bytes from map-output for attempt_local1764589720_0001_m_000000_02014-12-16 15:34:04,553 info [localfetcher#1] reduce.mergemanagerimpl (mergemanagerimpl.java:closeinmemoryfile(307)) - closeinmemoryfile -> map-output of size: 1832, inmemorymapoutputs.size() -> 1, commitmemory -> 0, usedmemory ->18322014-12-16 15:34:04,564 info [eventfetcher for fetching map completion events] reduce.eventfetcher (eventfetcher.java:run(76)) - eventfetcher is interrupted.. returning2014-12-16 15:34:04,566 info [pool-6-thread-1] mapred.localjobrunner (localjobrunner.java:statusupdate(591)) - 1 / 1 copied.2014-12-16 15:34:04,566 info [pool-6-thread-1] reduce.mergemanagerimpl (mergemanagerimpl.java:finalmerge(667)) - finalmerge called with 1 in-memory map-outputs and 0 on-disk map-outputs2014-12-16 15:34:04,585 info [pool-6-thread-1] mapred.merger (merger.java:merge(589)) - merging 1 sorted segments2014-12-16 15:34:04,585 info [pool-6-thread-1] mapred.merger (merger.java:merge(688)) - down to the last merge-pass, with 1 segments left of total size: 1823 bytes2014-12-16 15:34:04,605 info [pool-6-thread-1] reduce.mergemanagerimpl (mergemanagerimpl.java:finalmerge(742)) - merged 1 segments, 1832 bytes to disk to satisfy reduce memory limit2014-12-16 15:34:04,605 info [pool-6-thread-1] reduce.mergemanagerimpl (mergemanagerimpl.java:finalmerge(772)) - merging 1 files, 1836 bytes from disk2014-12-16 15:34:04,606 info [pool-6-thread-1] reduce.mergemanagerimpl (mergemanagerimpl.java:finalmerge(787)) - merging 0 segments, 0 bytes from memory into reduce2014-12-16 15:34:04,607 info [pool-6-thread-1] mapred.merger (merger.java:merge(589)) - merging 1 sorted segments2014-12-16 15:34:04,608 info [pool-6-thread-1] mapred.merger (merger.java:merge(688)) - down to the last merge-pass, with 1 segments left of total size: 1823 bytes2014-12-16 15:34:04,608 info [pool-6-thread-1] mapred.localjobrunner (localjobrunner.java:statusupdate(591)) - 1 / 1 copied.2014-12-16 15:34:04,643 info [pool-6-thread-1] configuration.deprecation (configuration.java:warnonceifdeprecated(996)) - mapred.skip.on is deprecated. instead, use mapreduce.job.skiprecords2014-12-16 15:34:04,714 info [main] mapreduce.job (job.java:monitorandprintjob(1362)) - map 100% reduce 0%2014-12-16 15:34:04,842 info [pool-6-thread-1] mapred.task (task.java:done(995)) - task:attempt_local1764589720_0001_r_000000_0 is done. and is in the process of committing2014-12-16 15:34:04,850 info [pool-6-thread-1] mapred.localjobrunner (localjobrunner.java:statusupdate(591)) - 1 / 1 copied.2014-12-16 15:34:04,850 info [pool-6-thread-1] mapred.task (task.java:commit(1156)) - task attempt_local1764589720_0001_r_000000_0 is allowed to commit now2014-12-16 15:34:04,881 info [pool-6-thread-1] output.fileoutputcommitter (fileoutputcommitter.java:committask(439)) - saved output of task \'attempt_local1764589720_0001_r_000000_0\' to hdfs://192.168.52.128:9000/data/output/_temporary/0/task_local1764589720_0001_r_0000002014-12-16 15:34:04,884 info [pool-6-thread-1] mapred.localjobrunner (localjobrunner.java:statusupdate(591)) - reduce > reduce2014-12-16 15:34:04,884 info [pool-6-thread-1] mapred.task (task.java:senddone(1115)) - task \'attempt_local1764589720_0001_r_000000_0\' done.2014-12-16 15:34:04,885 info [pool-6-thread-1] mapred.localjobrunner (localjobrunner.java:run(325)) - finishing task: attempt_local1764589720_0001_r_000000_02014-12-16 15:34:04,885 info [thread-4] mapred.localjobrunner (localjobrunner.java:runtasks(456)) - reduce task executor complete.2014-12-16 15:34:05,714 info [main] mapreduce.job (job.java:monitorandprintjob(1362)) - map 100% reduce 100%2014-12-16 15:34:05,714 info [main] mapreduce.job (job.java:monitorandprintjob(1373)) - job job_local1764589720_0001 completed successfully2014-12-16 15:34:05,733 info [main] mapreduce.job (job.java:monitorandprintjob(1380)) - counters: 38file system countersfile: number of bytes read=34542file: number of bytes written=470650file: number of read operations=0file: number of large read operations=0file: number of write operations=0hdfs: number of bytes read=2732hdfs: number of bytes written=1306hdfs: number of read operations=15hdfs: number of large read operations=0hdfs: number of write operations=4map-reduce frameworkmap input records=31map output records=179map output bytes=2055map output materialized bytes=1836input split bytes=113combine input records=179combine output records=131reduce input groups=131reduce shuffle bytes=1836reduce input records=131reduce output records=131spilled records=262shuffled maps =1failed shuffles=0merged map outputs=1gc time elapsed (ms)=13cpu time spent (ms)=0physical memory (bytes) snapshot=0virtual memory (bytes) snapshot=0total committed heap usage (bytes)=440664064shuffle errorsbad_id=0connection=0io_error=0wrong_length=0wrong_map=0wrong_reduce=0file input format countersbytes read=1366file output format countersbytes written=1306
