MRUnit使用技巧

导读为了能测试编写的hadoop组件和mapreduce程序，一般有下面三种思路：一、使用hadoop-eclipse插件来调试mapreduce程序，不过这在hadoop比较新的版本里已经不再提供了；二、是配置jvm参数远程调试hadoop组件。这种方式用于读hadoop源代码比较适合，而如
导读为了能测试编写的hadoop组件和mapreduce程序，一般有下面三种思路：
一、使用hadoop-eclipse插件来调试mapreduce程序，不过这在hadoop比较新的版本里已经不再提供了；
二、是配置jvm参数远程调试hadoop组件。这种方式用于读hadoop源代码比较适合，而如果用于远程调试mapreduce还是有点麻烦的；
详细参考的文档有：
http://blog.javachen.com/hadoop/2013/08/01/remote-debug-hadoop/
http://zhangjie.me/eclipse-debug-hadoop/
三、最后我选择了mruinit来用于主要开发调试mapreduce应用程序。
mrunit简介mrunit是用于做mapreduce单元测试的java库。使用apache发布，下载地址是：http://mrunit.apache.org/general/downloads.html
mrunit测试框架是基于junit的。我们可以方便的测试map ?reduce程序。它适用于?0.20 , 0.23.x , 1.0.x , 2.x 等 hadoop版本。
下面我们来做些mrunit的使用官方例子（sms cdr (call details record) analysis）：
使用记录如下
cdrid;cdrtype;phone1;phone2;sms status code655209;1;796764372490213;804422938115889;6353415;0;356857119806206;287572231184798;4835699;1;252280313968413;889717902341635;0
需要做的事情是查找所有cdrtype 为1的记录和它相关的状态码（sms status code）
map输出应该是：
6, 1
0, 1
代码如下：
public class smscdrmapper extends mapper { private text status = new text(); private final static intwritable addone = new intwritable(1); /** * returns the sms status code and its count */ protected void map(longwritable key, text value, context context) throws java.io.ioexception, interruptedexception { //655209;1;796764372490213;804422938115889;6 is the sample record format string[] line = value.tostring().split(;); // if record is of sms cdr if (integer.parseint(line[1]) == 1) { status.set(line[4]); context.write(status, addone); } }}
reduce 程序把最后的结果相加，程序如下：
public class smscdrreducer extends reducer { protected void reduce(text key, iterable values, context context) throws java.io.ioexception, interruptedexception { int sum = 0; for (intwritable value : values) { sum += value.get(); } context.write(key, new intwritable(sum)); }}
mrunit的测试程序如下：
import java.util.arraylist;import java.util.list;import org.apache.hadoop.io.intwritable;import org.apache.hadoop.io.longwritable;import org.apache.hadoop.io.text;import org.apache.hadoop.mrunit.mapreduce.mapdriver;import org.apache.hadoop.mrunit.mapreduce.mapreducedriver;import org.apache.hadoop.mrunit.mapreduce.reducedriver;import org.junit.before;import org.junit.test;public class smscdrmapperreducertest { mapdriver mapdriver; reducedriver reducedriver; mapreducedriver mapreducedriver; @before public void setup() { smscdrmapper mapper = new smscdrmapper(); smscdrreducer reducer = new smscdrreducer(); mapdriver = mapdriver.newmapdriver(mapper);; reducedriver = reducedriver.newreducedriver(reducer); mapreducedriver = mapreducedriver.newmapreducedriver(mapper, reducer); } @test public void testmapper() { mapdriver.withinput(new longwritable(), new text( 655209;1;796764372490213;804422938115889;6)); mapdriver.withoutput(new text(6), new intwritable(1)); mapdriver.runtest(); } @test public void testreducer() { list values = new arraylist(); values.add(new intwritable(1)); values.add(new intwritable(1)); reducedriver.withinput(new text(6), values); reducedriver.withoutput(new text(6), new intwritable(2)); reducedriver.runtest(); }}
使用过junit的就应该知道怎么运行上面的代码了，这里就不重复了。
mruint可以测试单个map，单个reduce和一个mapreduce或者多个mapreduce程序。
详细的可以参考官网文档：mrunit tutorial
参考：http://www.cnblogs.com/gpcuster/archive/2009/10/04/1577921.html
原文地址：mrunit使用技巧, 感谢原作者分享。

MRUnit使用技巧

推荐信息