hadoop mapreduce多表关联

hadoop mapreduce多表关联假设有如下两个文件，一个是表是公司和地址的序号的对应，一个表是地址的序号和地址的名称的对应。表1： [plain] a:beijing red star 1 a:shenzhen thunder 3 a:guangzhou honda 2 a:beijing rising 1 a:guangzhou development ba
hadoop mapreduce多表关联
假设有如下两个文件，一个是表是公司和地址的序号的对应，一个表是地址的序号和地址的名称的对应。
表1：
[plain]
a:beijing red star 1
a:shenzhen thunder 3
a:guangzhou honda 2
a:beijing rising 1
a:guangzhou development bank 2
a:tencent 3
a:back of beijing 1
表2：
[plain]
b:1 beijing
b:2 guangzhou
b:3 shenzhen
b:4 xian
mapreduce如下：
[plain]
private static final text typea = new text(a:);
private static final text typeb = new text(b:);
private static log log = logfactory.getlog(mtjoin.class);
public static class map extends mapper {
public void map(object key, text value, context context)
throws ioexception, interruptedexception {
string valuestr = value.tostring();
string type = valuestr.substring(0, 2);
string content = valuestr.substring(2);
log.info(content);
if(type.equals(a:))
{
string[] contentarray = content.split(\t);
string city = contentarray[0];
string address = contentarray[1];
mapwritable map = new mapwritable();
map.put(typea, new text(city));
context.write(new text(address), map);
}
else if(type.equals(b:))
{
string[] contentarray = content.split(\t);
string adrnum = contentarray[0];
string adrname = contentarray[1];
mapwritable map = new mapwritable();
map.put(typeb, new text(adrname));
context.write(new text(adrnum), map);
}
}
}
public static class reduce extends reducer {
public void reduce(text key, iterable values, context context)
throws ioexception, interruptedexception {
iterator it = values.iterator();
list citylist = new arraylist();
list adrlist = new arraylist();
while(it.hasnext())
{
mapwritable map = it.next();
if(map.containskey(typea))
{
citylist.add((text)map.get(typea));
}
else if(map.containskey(typeb))
{
adrlist.add((text)map.get(typeb));
}
}
for(int i = 0; i
{
for(int j = 0; j
{
context.write(citylist.get(i), adrlist.get(j));
}
}
}
}
原理很简单，map的出口，以地址的序号作为key，然后出来的时候，公司名称放一个list，地址的名称放一个list，两个list的内容作笛卡儿积，就得到了结果。
输出如下：
[plain]
beijing red star beijing
beijing rising beijing
back of beijing beijing
guangzhou honda guangzhou
guangzhou development bank guangzhou
shenzhen thunder shenzhen
tencent shenzhen

hadoop mapreduce多表关联

推荐信息