在将oracle存储过程迁移到hive平台时,不可避免地会遇到表关联的相应语法问题。本文详细对比了oralce和hive的各种表关联语法,包
在将oracle存储过程迁移到hive平台时,不可避免地会遇到表关联的相应语法问题。
本文详细对比了oralce和hive的各种表关联语法,,包括内关联,左,右关联,全外关联和笛卡尔积。
一.创建表
oracle:
create table a
(
a1 number(10),
a2 varchar2(50)
);
create table b
(
b1 number(10),
b2 varchar2(50)
);
hive:
create table if not exists a (
a1 string,
a2 string)
comment 'table a'
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as textfile
tblproperties ( 'created_at'='2014-04-28','creator'='henry' );
二.插入数据
oracle:
insert into a(a1,a2) values(1,'x');
insert into a(a1,a2) values(2,'y');
insert into a(a1,a2) values(3,'z');
insert into b(b1,b2) values(1,'x');
insert into b(b1,b2) values(2,'y');
insert into b(b1,b2) values(4,'z');
hive:
hive (default)> load data local inpath './data1' into table a;
copying data from file:/home/hadoop/roger/sql/renguihe/data
copying file: file:/home/hadoop/roger/sql/renguihe/data
loading data to table default.a
table default.a stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 12, raw_data_size: 0]
ok
time taken: 1.961 seconds
hive (default)> load data local inpath './data1' into table b;
copying data from file:/home/hadoop/roger/sql/renguihe/data
copying file: file:/home/hadoop/roger/sql/renguihe/data
loading data to table default.b
table default.b stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 12, raw_data_size: 0]
ok
time taken: 0.392 seconds
其中data1数据文件内容为:
1|x
2|y
3|z
data2数据文件内容为:
1|x
2|y
4|z
三.等值关联
oracle:
select * from a,b where a.a1 = b.b1;
或:
select * from a join b on a.a1 = b.b1;
结果如下图所示:
hive:
select * from a join b on a.a1 = b.b1;
注意hive中不能使用where来表示关联条件。
执行过程及结果如下图所示:
hive (default)> select * from a join b on a.a1 = b.b1;
total mapreduce jobs = 1
setting hadoop_user_name hadoop
execution log at: /tmp/hadoop/.log
2014-04-29 09:13:27 starting to launch local task to process map join; maximum memory = 1908932608
2014-04-29 09:13:27 processing rows: 3 hashtable size: 3 memory usage: 110981704 rate: 0.058
2014-04-29 09:13:27 dump the hashtable into file: file:/tmp/hadoop/hive_2014-04-29_09-13-25_273_8486588204512196396/-local-10002/hashtable-stage-3/mapjoin-mapfile00--.hashtable
2014-04-29 09:13:27 upload 1 file to: file:/tmp/hadoop/hive_2014-04-29_09-13-25_273_8486588204512196396/-local-10002/hashtable-stage-3/mapjoin-mapfile00--.hashtable file size: 438
2014-04-29 09:13:27 end of local task; time taken: 0.339 sec.
execution completed successfully
mapred local task succeeded . convert the join into mapjoin
mapred local task succeeded . convert the join into mapjoin
launching job 1 out of 1
number of reduce tasks is set to 0 since there's no reduce operator
starting job = job_201404251509_0131, tracking url = ip:50030/jobdetails.jsp?jobid=job_201404251509_0131
kill command = /home/hadoop/package/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201404251509_0131
hadoop job information for stage-3: number of mappers: 1; number of reducers: 0
2014-04-29 09:13:39,979 stage-3 map = 0%, reduce = 0%
2014-04-29 09:13:46,025 stage-3 map = 100%, reduce = 0%, cumulative cpu 1.59 sec
2014-04-29 09:13:47,034 stage-3 map = 100%, reduce = 0%, cumulative cpu 1.59 sec
2014-04-29 09:13:48,044 stage-3 map = 100%, reduce = 0%, cumulative cpu 1.59 sec
2014-04-29 09:13:49,052 stage-3 map = 100%, reduce = 0%, cumulative cpu 1.59 sec
2014-04-29 09:13:50,061 stage-3 map = 100%, reduce = 0%, cumulative cpu 1.59 sec
2014-04-29 09:13:51,069 stage-3 map = 100%, reduce = 0%, cumulative cpu 1.59 sec
2014-04-29 09:13:52,077 stage-3 map = 100%, reduce = 100%, cumulative cpu 1.59 sec
mapreduce total cumulative cpu time: 1 seconds 590 msec
ended job = job_201404251509_0131
mapreduce jobs launched:
job 0: map: 1 cumulative cpu: 1.59 sec hdfs read: 211 hdfs write: 16 success
total mapreduce cpu time spent: 1 seconds 590 msec
ok
a1 a2 b1 b2
1 x 1 x
2 y 2 y
更多详情见请继续阅读下一页的精彩内容:
