Alex的Hadoop菜鸟教程:第9课Sqoop1从Hbase或者Hive导出mysql

今天讲讲怎么用sqoop将hbase或者hive的东西导出到mysql。不过事先要告诉大家
目前sqoop没有办法把数据直接从hbase导出到mysql。必须要通过hive建立2个表，一个外部表是基于这个hbase表的，另一个是单纯的基于hdfs的hive原生表，然后把外部表的数据导入到原生表（临时），然后通过hive将临时表里面的数据导出到mysql
数据准备mysql建立空表
create table `employee` ( `rowkey` int(11) not null, `id` int(11) not null, `name` varchar(20) not null, primary key (`id`) ) engine=myisam default charset=utf8;
注意：因为大家习惯性的把hive表用于映射hbase的rowkey的字段命名为key，所以在建立mysql的table的时候有可能也建立对应的key字段，但是key是mysql的保留字，会导致insert语句无法插入的问题
hbase建立employee表建立employee表，并插入数据hbase(main):005:0> create 'employee','info'0 row(s) in 0.4740 seconds=> hbase::table - employeehbase(main):006:0> put 'employee',1,'info:id',10 row(s) in 0.2080 secondshbase(main):008:0> scan 'employee'row column+cell 1 column=info:id, timestamp=1417591291730, value=1 1 row(s) in 0.0610 secondshbase(main):009:0> put 'employee',1,'info:name','peter'0 row(s) in 0.0220 secondshbase(main):010:0> scan 'employee'row column+cell 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 1 row(s) in 0.0450 secondshbase(main):011:0> put 'employee',2,'info:id',20 row(s) in 0.0370 secondshbase(main):012:0> put 'employee',2,'info:name','paul'0 row(s) in 0.0180 secondshbase(main):013:0> scan 'employee'row column+cell 1 column=info:id, timestamp=1417591291730, value=1 1 column=info:name, timestamp=1417591321072, value=peter 2 column=info:id, timestamp=1417591500179, value=2 2 column=info:name, timestamp=1417591512075, value=paul 2 row(s) in 0.0440 seconds
建立hive外部表hive 有分为原生表和外部表，原生表是以简单文件方式存储在hdfs里面，外部表依赖别的框架，比如hbase，我们现在建立一个依赖于我们刚刚建立的employee hbase表的hive 外部表
hive> create external table h_employee(key int, id int, name string) > stored by 'org.apache.hadoop.hive.hbase.hbasestoragehandler' > with serdeproperties (hbase.columns.mapping = :key, info:id,info:name) > tblproperties (hbase.table.name = employee);oktime taken: 0.324 secondshive> select * from h_employee;ok1 1 peter2 2 paultime taken: 1.129 seconds, fetched: 2 row(s)
建立hive原生表这个hive原生表只是用于导出的时候临时使用的，所以取名叫 h_employee_export，字段之间的分隔符用逗号
create table h_employee_export(key int, id int, name string)row format delimited fields terminated by '\054';
我们去看下实际存储的文本文件是什么样子的
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_01,1,peter2,2,paul

源hive表导入数据到临时表
第一步先将数据从 h_employee(基于hbase的外部表)导入到 h_employee_export(原生hive表)
hive> insert overwrite table h_employee_export select * from h_employee;
hive> select * from h_employee_export;ok1 1 peter2 2 paultime taken: 0.359 seconds, fetched: 2 row(s)
我们去看下实际存储的文本文件长什么样子$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_01,1,peter2,2,paul

从hive导出数据到mysql
$ sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/warning: /usr/lib/sqoop/../hive-hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.warning: /usr/lib/sqoop/../accumulo does not exist! accumulo imports will fail.please set $accumulo_home to the root of your accumulo installation.14/12/05 08:49:35 info sqoop.sqoop: running sqoop version: 1.4.4-cdh5.0.114/12/05 08:49:35 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/12/05 08:49:35 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/12/05 08:49:35 info tool.codegentool: beginning code generation14/12/05 08:49:36 info manager.sqlmanager: executing sql statement: select t.* from `employee` as t limit 114/12/05 08:49:36 info manager.sqlmanager: executing sql statement: select t.* from `employee` as t limit 114/12/05 08:49:36 info orm.compilationmanager: hadoop_mapred_home is /usr/lib/hadoop-mapreducenote: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/12/05 08:49:39 info orm.compilationmanager: writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar14/12/05 08:49:39 info mapreduce.exportjobbase: beginning export of employee14/12/05 08:49:41 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/12/05 08:49:43 info configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. instead, use mapreduce.reduce.speculative14/12/05 08:49:43 info configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. instead, use mapreduce.map.speculative14/12/05 08:49:43 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/12/05 08:49:43 info client.rmproxy: connecting to resourcemanager at hadoop01/192.111.78.111:803214/12/05 08:49:45 info input.fileinputformat: total input paths to process : 114/12/05 08:49:45 info input.fileinputformat: total input paths to process : 114/12/05 08:49:45 info mapreduce.jobsubmitter: number of splits:114/12/05 08:49:46 info mapreduce.jobsubmitter: submitting tokens for job: job_1406097234796_003714/12/05 08:49:46 info impl.yarnclientimpl: submitted application application_1406097234796_003714/12/05 08:49:46 info mapreduce.job: the url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/14/12/05 08:49:46 info mapreduce.job: running job: job_1406097234796_003714/12/05 08:49:59 info mapreduce.job: job job_1406097234796_0037 running in uber mode : false14/12/05 08:49:59 info mapreduce.job: map 0% reduce 0%14/12/05 08:50:10 info mapreduce.job: map 100% reduce 0%14/12/05 08:50:10 info mapreduce.job: job job_1406097234796_0037 completed successfully14/12/05 08:50:10 info mapreduce.job: counters: 30 file system counters file: number of bytes read=0 file: number of bytes written=99761 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=166 hdfs: number of bytes written=0 hdfs: number of read operations=4 hdfs: number of large read operations=0 hdfs: number of write operations=0 job counters launched map tasks=1 data-local map tasks=1 total time spent by all maps in occupied slots (ms)=8805 total time spent by all reduces in occupied slots (ms)=0 total time spent by all map tasks (ms)=8805 total vcore-seconds taken by all map tasks=8805 total megabyte-seconds taken by all map tasks=9016320 map-reduce framework map input records=2 map output records=2 input split bytes=144 spilled records=0 failed shuffles=0 merged map outputs=0 gc time elapsed (ms)=97 cpu time spent (ms)=1360 physical memory (bytes) snapshot=167555072 virtual memory (bytes) snapshot=684212224 total committed heap usage (bytes)=148897792 file input format counters bytes read=0 file output format counters bytes written=014/12/05 08:50:10 info mapreduce.exportjobbase: transferred 166 bytes in 27.0676 seconds (6.1328 bytes/sec)14/12/05 08:50:10 info mapreduce.exportjobbase: exported 2 records.
注意
在这段日志中有这样一句话14/12/05 08:49:46 info mapreduce.job: the url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/
意思是你可以用浏览器访问这个地址去看下任务的执行情况，如果你的任务长时间卡主没结束就是出错了，可以去这个地址查看详细的错误日志
查看结果mysql> select * from employee;+--------+----+-------+| rowkey | id | name |+--------+----+-------+| 1 | 1 | peter || 2 | 2 | paul |+--------+----+-------+2 rows in set (0.00 sec)mysql>
导入成功

Alex的Hadoop菜鸟教程:第9课Sqoop1从Hbase或者Hive导出mysql_MySQL

推荐信息