您好,欢迎访问一九零五行业门户网

Hadoop2.4.1入门实例:MaxTemperature

注意:以下内容在2.x版本与1.x版本同样适用,已在2.4.1与1.2.0进行测试。 一、前期准备 1、创建伪分布hadoop环境,请参考官方文档。或者http://blog.csdn.net/jediael_lu/article/details/38637277 2、准备数据文件如下sample.txt: 12345679867623119010123
注意:以下内容在2.x版本与1.x版本同样适用,已在2.4.1与1.2.0进行测试。
一、前期准备
1、创建伪分布hadoop环境,请参考官方文档。或者http://blog.csdn.net/jediael_lu/article/details/38637277
2、准备数据文件如下sample.txt:
123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356
123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456
123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456
123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456
123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456
123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456
123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456
123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+00821235678903456
二、编写代码
1、创建map
package org.jediael.hadoopdemo.maxtemperature;import java.io.ioexception;import org.apache.hadoop.io.intwritable;import org.apache.hadoop.io.longwritable;import org.apache.hadoop.io.text;import org.apache.hadoop.mapreduce.mapper;public class maxtemperaturemapper extends mapper { private static final int missing = 9999; @override public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { string line = value.tostring(); string year = line.substring(15, 19); int airtemperature; if (line.charat(87) == '+') { // parseint doesn't like leading plus // signs airtemperature = integer.parseint(line.substring(88, 92)); } else { airtemperature = integer.parseint(line.substring(87, 92)); } string quality = line.substring(92, 93); if (airtemperature != missing && quality.matches([01459])) { context.write(new text(year), new intwritable(airtemperature)); } }}
2、创建reducepackage org.jediael.hadoopdemo.maxtemperature;import java.io.ioexception;import org.apache.hadoop.io.intwritable;import org.apache.hadoop.io.text;import org.apache.hadoop.mapreduce.reducer;public class maxtemperaturereducer extends reducer { @override public void reduce(text key, iterable values, context context) throws ioexception, interruptedexception { int maxvalue = integer.min_value; for (intwritable value : values) { maxvalue = math.max(maxvalue, value.get()); } context.write(key, new intwritable(maxvalue)); }}
3、创建main方法package org.jediael.hadoopdemo.maxtemperature;import org.apache.hadoop.fs.path;import org.apache.hadoop.io.intwritable;import org.apache.hadoop.io.text;import org.apache.hadoop.mapreduce.job;import org.apache.hadoop.mapreduce.lib.input.fileinputformat;import org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class maxtemperature { public static void main(string[] args) throws exception { if (args.length != 2) { system.err .println(usage: maxtemperature ); system.exit(-1); } job job = new job(); job.setjarbyclass(maxtemperature.class); job.setjobname(max temperature); fileinputformat.addinputpath(job, new path(args[0])); fileoutputformat.setoutputpath(job, new path(args[1])); job.setmapperclass(maxtemperaturemapper.class); job.setreducerclass(maxtemperaturereducer.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); system.exit(job.waitforcompletion(true) ? 0 : 1); }}
4、导出成maxtemp.jar,并上传至运行程序的服务器。
三、运行程序
1、创建input目录并将sample.txt复制到input目录
hadoop fs -put sample.txt /
2、运行程序
export hadoop_classpath=maxtemp.jar
hadoop org.jediael.hadoopdemo.maxtemperature.maxtemperature /sample.txt output10
注意输出目录不能已经存在,否则会创建失败。
3、查看结果
(1)查看结果
[jediael@jediael44 code]$  hadoop fs -cat output10/*
14/07/09 14:51:35 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1901    42
1902    212
1903    412
1904    32
1905    102
(2)运行时输出
[jediael@jediael44 code]$  hadoop org.jediael.hadoopdemo.maxtemperature.maxtemperature /sample.txt output10
14/07/09 14:50:40 warn util.nativecodeloader: unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/09 14:50:41 info client.rmproxy: connecting to resourcemanager at /0.0.0.0:8032
14/07/09 14:50:42 warn mapreduce.jobsubmitter: hadoop command-line option parsing not performed. implement the tool interface and execute your application with toolrunner to remedy this.
14/07/09 14:50:43 info input.fileinputformat: total input paths to process : 1
14/07/09 14:50:43 info mapreduce.jobsubmitter: number of splits:1
14/07/09 14:50:44 info mapreduce.jobsubmitter: submitting tokens for job: job_1404888618764_0001
14/07/09 14:50:44 info impl.yarnclientimpl: submitted application application_1404888618764_0001
14/07/09 14:50:44 info mapreduce.job: the url to track the job: http://jediael44:8088/proxy/application_1404888618764_0001/
14/07/09 14:50:44 info mapreduce.job: running job: job_1404888618764_0001
14/07/09 14:50:57 info mapreduce.job: job job_1404888618764_0001 running in uber mode : false
14/07/09 14:50:57 info mapreduce.job:  map 0% reduce 0%
14/07/09 14:51:05 info mapreduce.job:  map 100% reduce 0%
14/07/09 14:51:15 info mapreduce.job:  map 100% reduce 100%
14/07/09 14:51:15 info mapreduce.job: job job_1404888618764_0001 completed successfully
14/07/09 14:51:16 info mapreduce.job: counters: 49
        file system counters
                file: number of bytes read=94
                file: number of bytes written=185387
                file: number of read operations=0
                file: number of large read operations=0
                file: number of write operations=0
                hdfs: number of bytes read=1051
                hdfs: number of bytes written=43
                hdfs: number of read operations=6
                hdfs: number of large read operations=0
                hdfs: number of write operations=2
        job counters 
                launched map tasks=1
                launched reduce tasks=1
                data-local map tasks=1
                total time spent by all maps in occupied slots (ms)=5812
                total time spent by all reduces in occupied slots (ms)=7023
                total time spent by all map tasks (ms)=5812
                total time spent by all reduce tasks (ms)=7023
                total vcore-seconds taken by all map tasks=5812
                total vcore-seconds taken by all reduce tasks=7023
                total megabyte-seconds taken by all map tasks=5951488
                total megabyte-seconds taken by all reduce tasks=7191552
        map-reduce framework
                map input records=9
                map output records=8
                map output bytes=72
                map output materialized bytes=94
                input split bytes=97
                combine input records=0
                combine output records=0
                reduce input groups=5
                reduce shuffle bytes=94
                reduce input records=8
                reduce output records=5
                spilled records=16
                shuffled maps =1
                failed shuffles=0
                merged map outputs=1
                gc time elapsed (ms)=154
                cpu time spent (ms)=1450
                physical memory (bytes) snapshot=303112192
                virtual memory (bytes) snapshot=1685733376
                total committed heap usage (bytes)=136515584
        shuffle errors
                bad_id=0
                connection=0
                io_error=0
                wrong_length=0
                wrong_map=0
                wrong_reduce=0
        file input format counters 
                bytes read=954
        file output format counters 
                bytes written=43
其它类似信息

推荐信息