sqoop是一个用来将hadoop(hive、hbase)和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如:mysql ,oracle ,postgres等)中的数据导入到hadoop的hdfs中,也可以将hdfs的数据导入到关系型数据库中。sqoop目前已经是apache的顶级项目了,
sqoop是一个用来将hadoop(hive、hbase)和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如:mysql ,oracle ,postgres等)中的数据导入到hadoop的hdfs中,也可以将hdfs的数据导入到关系型数据库中。sqoop目前已经是apache的顶级项目了,目前版本是1.4.4 和 sqoop2 1.99.3,本文以1.4.4的版本为例讲解基本的安装配置和简单应用的演示。安装配置 准备测试数据 导入数据到hdfs 导入数据到hive 导入数据到hbase[一]、安装配置选择sqoop 1.4.4 版本:sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz1.1、下载后解压配置:tar -zxvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/local/cd /usr/localln -s sqoop-1.4.4.bin__hadoop-2.0.4-alpha sqoop
1.2、环境变量配置 vi ~/.bash_profile :#sqoop add by micmiu.comexport sqoop_home=/usr/local/sqoopexport path=$sqoop_home/bin:$path
1.3、配置sqoop参数:复制/conf/sqoop-env-template.sh 一份重命名为:/conf/sqoop-env.shvi ?/conf/sqoop-env.sh# 指定各环境变量的实际配置# set hadoop-specific environment variables here.#set path to where bin/hadoop is available#export hadoop_common_home=#set path to where hadoop-*-core.jar is available#export hadoop_mapred_home=#set the path to where bin/hbase is available#export hbase_home=#set the path to where bin/hive is available#export hive_home=
ps:因为我当前用户的默认环境变量中已经配置了相关变量,故该配置文件无需再修改:# hadoop export hadoop_prefix=/usr/local/hadoop export hadoop_home=${hadoop_prefix} export path=$path:$hadoop_prefix/bin:$hadoop_prefix/sbinexport hadoop_common_home=${hadoop_prefix} export hadoop_hdfs_home=${hadoop_prefix} export hadoop_mapred_home=${hadoop_prefix}export hadoop_yarn_home=${hadoop_prefix} # native path export hadoop_common_lib_native_dir=${hadoop_prefix}/lib/native export hadoop_opts=-djava.library.path=$hadoop_prefix/lib/native # hadoop end#hiveexport hive_home=/usr/local/hiveexport path=$hive_home/bin:$path#hbaseexport hbase_home=/usr/local/hbaseexport path=$hbase#add by micmiu.com
1.4、驱动jar包下面测试演示以mysql为例,则需要把mysql对应的驱动lib文件copy到 /lib 目录下。[二]、测试数据准备以mysql 为例:192.168.6.77(hostname:master.hadoop) database: test 用户:root 密码:micmiu准备两张测试表一个有主键表demo_blog,一个无主键表 demo_log:create table `demo_blog` ( `id` int(11) not null auto_increment, `blog` varchar(100) not null, primary key (`id`)) engine=myisam default charset=utf8;
create table `demo_log` ( `operator` varchar(16) not null, `log` varchar(100) not null) engine=myisam default charset=utf8;
插入测试数据:insert into demo_blog (id, blog) values (1, micmiu.com);insert into demo_blog (id, blog) values (2, ctosun.com);insert into demo_blog (id, blog) values (3, baby.micmiu.com);insert into demo_log (operator, log) values (micmiu, create);insert into demo_log (operator, log) values (micmiu, update);insert into demo_log (operator, log) values (michael, edit);insert into demo_log (operator, log) values (michael, delete);
[三]、导入数据到hdfs3.1、导入有主键的表比如我需要把表 demo_blog (含主键) 的数据导入到hdfs中,执行如下命令:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog
执行过程如下:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blogwarning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 09:58:43 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 09:58:43 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 09:58:43 info tool.codegentool: beginning code generation14/04/09 09:58:43 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 09:58:43 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 09:58:43 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/e8fd26a5bca5b7f51cdb03bf847ce389/demo_blog.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 09:58:44 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/e8fd26a5bca5b7f51cdb03bf847ce389/demo_blog.jar14/04/09 09:58:44 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 09:58:44 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 09:58:44 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 09:58:44 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 09:58:44 info mapreduce.importjobbase: beginning import of demo_blogslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 09:58:44 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 09:58:45 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 09:58:45 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 09:58:47 info db.datadrivendbinputformat: boundingvalsquery: select min(`id`), max(`id`) from `demo_blog`14/04/09 09:58:47 info mapreduce.jobsubmitter: number of splits:314/04/09 09:58:47 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 09:58:47 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 09:58:47 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 09:58:47 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 09:58:47 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 09:58:47 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 09:58:47 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 09:58:47 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 09:58:47 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 09:58:47 info configuration.deprecation: mapred.output.dir is deprecated. instead, use mapreduce.output.fileoutputformat.outputdir14/04/09 09:58:47 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 09:58:47 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 09:58:47 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 09:58:47 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 09:58:47 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000114/04/09 09:58:47 info impl.yarnclientimpl: submitted application application_1396936838233_0001 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 09:58:47 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0001/14/04/09 09:58:47 info mapreduce.job: running job: job_1396936838233_000114/04/09 09:59:00 info mapreduce.job: job job_1396936838233_0001 running in uber mode : false14/04/09 09:59:00 info mapreduce.job: map 0% reduce 0%14/04/09 09:59:14 info mapreduce.job: map 33% reduce 0%14/04/09 09:59:16 info mapreduce.job: map 67% reduce 0%14/04/09 09:59:19 info mapreduce.job: map 100% reduce 0%14/04/09 09:59:19 info mapreduce.job: job job_1396936838233_0001 completed successfully14/04/09 09:59:19 info mapreduce.job: counters: 27 file system counters file: number of bytes read=0 file: number of bytes written=271866 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=295 hdfs: number of bytes written=44 hdfs: number of read operations=12 hdfs: number of large read operations=0 hdfs: number of write operations=6 job counters launched map tasks=3 other local map tasks=3 total time spent by all maps in occupied slots (ms)=43032 total time spent by all reduces in occupied slots (ms)=0 map-reduce framework map input records=3 map output records=3 input split bytes=295 spilled records=0 failed shuffles=0 merged map outputs=0 gc time elapsed (ms)=590 cpu time spent (ms)=6330 physical memory (bytes) snapshot=440934400 virtual memory (bytes) snapshot=3882573824 total committed heap usage (bytes)=160563200 file input format counters bytes read=0 file output format counters bytes written=4414/04/09 09:59:19 info mapreduce.importjobbase: transferred 44 bytes in 34.454 seconds (1.2771 bytes/sec)14/04/09 09:59:19 info mapreduce.importjobbase: retrieved 3 records.
验证导入到hdfs上的数据:$ hdfs dfs -ls /user/hadoop/demo_blogfound 4 items-rw-r--r-- 3 hadoop supergroup 0 2014-04-09 09:59 /user/hadoop/demo_blog/_success-rw-r--r-- 3 hadoop supergroup 13 2014-04-09 09:59 /user/hadoop/demo_blog/part-m-00000-rw-r--r-- 3 hadoop supergroup 13 2014-04-09 09:59 /user/hadoop/demo_blog/part-m-00001-rw-r--r-- 3 hadoop supergroup 18 2014-04-09 09:59 /user/hadoop/demo_blog/part-m-00002[hadoop@master ~]$ hdfs dfs -cat /user/hadoop/demo_blog/part-m-0000*1,micmiu.com2,ctosun.com3,baby.micmiu.com
ps:默认设置下导入到hdfs上的路径是:?/user/username/tablename/(files),比如我的当前用户是hadoop,那么实际路径即:?/user/hadoop/demo_blog/(files)。如果要自定义路径需要增加参数:--warehouse-dir 比如:sqoop import --connect jdbc:mysql://master.hadoop/test --username root --password micmiu --table demo_blog --warehouse-dir /user/micmiu/sqoop
3.2、导入不含主键的表比如需要把表 demo_log(无主键) 的数据导入到hdfs中,执行如下命令:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_log --warehouse-dir /user/micmiu/sqoop --split-by operator
ps:无主键表的导入需要增加参数? --split-by xxx ?或者 -m 1 执行过程:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_log --warehouse-dir /user/micmiu/sqoop --split-by operatorwarning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 15:02:06 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 15:02:06 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 15:02:06 info tool.codegentool: beginning code generation14/04/09 15:02:06 info manager.sqlmanager: executing sql statement: select t.* from `demo_log` as t limit 114/04/09 15:02:06 info manager.sqlmanager: executing sql statement: select t.* from `demo_log` as t limit 114/04/09 15:02:06 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/dddc1bcdba30515f95a2d604f22e4fe9/demo_log.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 15:02:07 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/dddc1bcdba30515f95a2d604f22e4fe9/demo_log.jar14/04/09 15:02:07 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 15:02:07 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 15:02:07 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 15:02:07 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 15:02:07 info mapreduce.importjobbase: beginning import of demo_logslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 15:02:07 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 15:02:08 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 15:02:08 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 15:02:10 info db.datadrivendbinputformat: boundingvalsquery: select min(`operator`), max(`operator`) from `demo_log`14/04/09 15:02:10 warn db.textsplitter: generating splits for a textual index column.14/04/09 15:02:10 warn db.textsplitter: if your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.14/04/09 15:02:10 warn db.textsplitter: you are strongly encouraged to choose an integral split column.14/04/09 15:02:10 info mapreduce.jobsubmitter: number of splits:414/04/09 15:02:10 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 15:02:10 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 15:02:10 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 15:02:10 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 15:02:10 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 15:02:10 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 15:02:10 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 15:02:10 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 15:02:10 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 15:02:10 info configuration.deprecation: mapred.output.dir is deprecated. instead, use mapreduce.output.fileoutputformat.outputdir14/04/09 15:02:10 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 15:02:10 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 15:02:10 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 15:02:10 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 15:02:10 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000314/04/09 15:02:10 info impl.yarnclientimpl: submitted application application_1396936838233_0003 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 15:02:10 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0003/14/04/09 15:02:10 info mapreduce.job: running job: job_1396936838233_000314/04/09 15:02:17 info mapreduce.job: job job_1396936838233_0003 running in uber mode : false14/04/09 15:02:17 info mapreduce.job: map 0% reduce 0%14/04/09 15:02:28 info mapreduce.job: map 25% reduce 0%14/04/09 15:02:30 info mapreduce.job: map 50% reduce 0%14/04/09 15:02:33 info mapreduce.job: map 100% reduce 0%14/04/09 15:02:33 info mapreduce.job: job job_1396936838233_0003 completed successfully14/04/09 15:02:33 info mapreduce.job: counters: 27 file system counters file: number of bytes read=0 file: number of bytes written=362536 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=516 hdfs: number of bytes written=56 hdfs: number of read operations=16 hdfs: number of large read operations=0 hdfs: number of write operations=8 job counters launched map tasks=4 other local map tasks=4 total time spent by all maps in occupied slots (ms)=44481 total time spent by all reduces in occupied slots (ms)=0 map-reduce framework map input records=4 map output records=4 input split bytes=516 spilled records=0 failed shuffles=0 merged map outputs=0 gc time elapsed (ms)=429 cpu time spent (ms)=6650 physical memory (bytes) snapshot=587669504 virtual memory (bytes) snapshot=5219356672 total committed heap usage (bytes)=205848576 file input format counters bytes read=0 file output format counters bytes written=5614/04/09 15:02:33 info mapreduce.importjobbase: transferred 56 bytes in 25.2746 seconds (2.2157 bytes/sec)14/04/09 15:02:33 info mapreduce.importjobbase: retrieved 4 records.
验证导入的数据:$ hdfs dfs -ls /user/micmiu/sqoop/demo_logfound 5 items-rw-r--r-- 3 hadoop supergroup 0 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/_success-rw-r--r-- 3 hadoop supergroup 28 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00000-rw-r--r-- 3 hadoop supergroup 0 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00001-rw-r--r-- 3 hadoop supergroup 0 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00002-rw-r--r-- 3 hadoop supergroup 28 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00003$ hdfs dfs -cat /user/micmiu/sqoop/demo_log/part-m-0000*michael,editmichael,deletemicmiu,createmicmiu,update
[四]、导入数据到hive比如把表demo_blog 数据导入到hive中,增加参数 --hive-import?:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --warehouse-dir /user/sqoop --hive-import --create-hive-table
执行过程如下:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --warehouse-dir /user/sqoop --hive-import --create-hive-table warning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 10:44:21 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 10:44:21 info tool.basesqooptool: using hive-specific delimiters for output. you can override14/04/09 10:44:21 info tool.basesqooptool: delimiters with --fields-terminated-by, etc.14/04/09 10:44:21 warn tool.basesqooptool: it seems that you've specified at least one of following:14/04/09 10:44:21 warn tool.basesqooptool: --hive-home14/04/09 10:44:21 warn tool.basesqooptool: --hive-overwrite14/04/09 10:44:21 warn tool.basesqooptool: --create-hive-table14/04/09 10:44:21 warn tool.basesqooptool: --hive-table14/04/09 10:44:21 warn tool.basesqooptool: --hive-partition-key14/04/09 10:44:21 warn tool.basesqooptool: --hive-partition-value14/04/09 10:44:21 warn tool.basesqooptool: --map-column-hive14/04/09 10:44:21 warn tool.basesqooptool: without specifying parameter --hive-import. please note that14/04/09 10:44:21 warn tool.basesqooptool: those arguments will not be used in this session. either14/04/09 10:44:21 warn tool.basesqooptool: specify --hive-import to apply them correctly or remove them14/04/09 10:44:21 warn tool.basesqooptool: from command line to remove this warning.14/04/09 10:44:21 info tool.basesqooptool: please note that --hive-home, --hive-partition-key, 14/04/09 10:44:21 info tool.basesqooptool: hive-partition-value and --map-column-hive options are 14/04/09 10:44:21 info tool.basesqooptool: are also valid for hcatalog imports and exports14/04/09 10:44:21 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 10:44:21 info tool.codegentool: beginning code generation14/04/09 10:44:21 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 10:44:21 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 10:44:21 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/c071f02ecad006293202fd2c2fad0dce/demo_blog.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 10:44:22 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/c071f02ecad006293202fd2c2fad0dce/demo_blog.jar14/04/09 10:44:22 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 10:44:22 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 10:44:22 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 10:44:22 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 10:44:22 info mapreduce.importjobbase: beginning import of demo_blogslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 10:44:22 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 10:44:23 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 10:44:23 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 10:44:25 info db.datadrivendbinputformat: boundingvalsquery: select min(`id`), max(`id`) from `demo_blog`14/04/09 10:44:25 info mapreduce.jobsubmitter: number of splits:314/04/09 10:44:25 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 10:44:25 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 10:44:25 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 10:44:25 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 10:44:25 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 10:44:25 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 10:44:25 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 10:44:25 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 10:44:25 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 10:44:25 info configuration.deprecation: mapred.output.dir is deprecated. instead, use mapreduce.output.fileoutputformat.outputdir14/04/09 10:44:25 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 10:44:25 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 10:44:25 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 10:44:25 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 10:44:25 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000214/04/09 10:44:25 info impl.yarnclientimpl: submitted application application_1396936838233_0002 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 10:44:25 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0002/14/04/09 10:44:25 info mapreduce.job: running job: job_1396936838233_000214/04/09 10:44:33 info mapreduce.job: job job_1396936838233_0002 running in uber mode : false14/04/09 10:44:33 info mapreduce.job: map 0% reduce 0%14/04/09 10:44:46 info mapreduce.job: map 67% reduce 0%14/04/09 10:44:48 info mapreduce.job: map 100% reduce 0%14/04/09 10:44:49 info mapreduce.job: job job_1396936838233_0002 completed successfully14/04/09 10:44:49 info mapreduce.job: counters: 27 file system counters file: number of bytes read=0 file: number of bytes written=271860 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=295 hdfs: number of bytes written=44 hdfs: number of read operations=12 hdfs: number of large read operations=0 hdfs: number of write operations=6 job counters launched map tasks=3 other local map tasks=3 total time spent by all maps in occupied slots (ms)=34047 total time spent by all reduces in occupied slots (ms)=0 map-reduce framework map input records=3 map output records=3 input split bytes=295 spilled records=0 failed shuffles=0 merged map outputs=0 gc time elapsed (ms)=505 cpu time spent (ms)=5350 physical memory (bytes) snapshot=427388928 virtual memory (bytes) snapshot=3881439232 total committed heap usage (bytes)=171638784 file input format counters bytes read=0 file output format counters bytes written=4414/04/09 10:44:49 info mapreduce.importjobbase: transferred 44 bytes in 26.0401 seconds (1.6897 bytes/sec)14/04/09 10:44:49 info mapreduce.importjobbase: retrieved 3 records.14/04/09 10:44:49 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 10:44:49 info hive.hiveimport: loading uploaded data into hive14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.min.split.size is deprecated. instead, use mapreduce.input.fileinputformat.split.minsize14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. instead, use mapreduce.reduce.speculative14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.min.split.size.per.node is deprecated. instead, use mapreduce.input.fileinputformat.split.minsize.per.node14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.input.dir.recursive is deprecated. instead, use mapreduce.input.fileinputformat.input.dir.recursive14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.min.split.size.per.rack is deprecated. instead, use mapreduce.input.fileinputformat.split.minsize.per.rack14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.max.split.size is deprecated. instead, use mapreduce.input.fileinputformat.split.maxsize14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. instead, use mapreduce.job.committer.setup.cleanup.needed14/04/09 10:44:53 info hive.hiveimport: 14/04/09 10:44:53 warn conf.hiveconf: deprecated: hive.metastore.ds.retry.* no longer has any effect. use hive.hmshandler.retry.* instead14/04/09 10:44:53 info hive.hiveimport: 14/04/09 10:44:53 info hive.hiveimport: logging initialized using configuration in file:/usr/local/hive-0.13.0-bin/conf/hive-log4j.properties14/04/09 10:44:53 info hive.hiveimport: slf4j: class path contains multiple slf4j bindings.14/04/09 10:44:53 info hive.hiveimport: slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]14/04/09 10:44:53 info hive.hiveimport: slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]14/04/09 10:44:53 info hive.hiveimport: slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.14/04/09 10:44:53 info hive.hiveimport: slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 10:44:57 info hive.hiveimport: ok14/04/09 10:44:57 info hive.hiveimport: time taken: 0.773 seconds14/04/09 10:44:57 info hive.hiveimport: loading data to table default.demo_blog14/04/09 10:44:57 info hive.hiveimport: table default.demo_blog stats: [numfiles=4, numrows=0, totalsize=44, rawdatasize=0]14/04/09 10:44:57 info hive.hiveimport: ok14/04/09 10:44:57 info hive.hiveimport: time taken: 0.25 seconds14/04/09 10:44:57 info hive.hiveimport: hive import complete.14/04/09 10:44:57 info hive.hiveimport: export directory is empty, removing it
hive cli中验证导入的数据:hive> show tables;okdemo_bloghbase_table_1hbase_table_2hbase_table_3micmiu_blogmicmiu_hx_masterpokesxflow_dstiptime taken: 0.073 seconds, fetched: 8 row(s)hive> select * from demo_blog;ok1 micmiu.com2 ctosun.com3 baby.micmiu.comtime taken: 0.506 seconds, fetched: 3 row(s)
[五]、导入数据到hbase演示把表 demo_blog 数据导入到hbase ,指定hbase中表名为 demo_sqoop2hbase 的命令:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --hbase-table demo_sqoop2hbase --hbase-create-table --hbase-row-key id --column-family url
执行过程:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --hbase-table demo_sqoop2hbase --hbase-create-table --hbase-row-key id --column-family urlwarning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 16:23:38 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 16:23:38 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 16:23:38 info tool.codegentool: beginning code generation14/04/09 16:23:39 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 16:23:39 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 16:23:39 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/85408c854ee8fba75bbb2458e5e25093/demo_blog.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 16:23:40 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/85408c854ee8fba75bbb2458e5e25093/demo_blog.jar14/04/09 16:23:40 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 16:23:40 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 16:23:40 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 16:23:40 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 16:23:40 info mapreduce.importjobbase: beginning import of demo_blogslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 16:23:40 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 16:23:40 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 16:23:41 info zookeeper.zookeeper: client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 gmt14/04/09 16:23:41 info zookeeper.zookeeper: client environment:host.name=master.hadoop14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.version=1.6.0_2014/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.vendor=sun microsystems inc.14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.home=/java/jdk1.6.0_20/jre14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.class.path=/usr/local/hadoop/etc/hadoop: .......14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.library.path=/usr/local/hadoop/lib/native14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.io.tmpdir=/tmp14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.compiler=14/04/09 16:23:41 info zookeeper.zookeeper: client environment:os.name=linux14/04/09 16:23:41 info zookeeper.zookeeper: client environment:os.arch=amd6414/04/09 16:23:41 info zookeeper.zookeeper: client environment:os.version=2.6.32-71.el6.x86_6414/04/09 16:23:41 info zookeeper.zookeeper: client environment:user.name=hadoop14/04/09 16:23:41 info zookeeper.zookeeper: client environment:user.home=/home/hadoop14/04/09 16:23:41 info zookeeper.zookeeper: client environment:user.dir=/home/hadoop14/04/09 16:23:41 info zookeeper.zookeeper: initiating client connection, connectstring=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181 sessiontimeout=90000 watcher=hconnection-0x57c8b24d, quorum=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181, baseznode=/hbase14/04/09 16:23:41 info zookeeper.clientcnxn: opening socket connection to server slave5.hadoop/192.168.8.205:2181. will not attempt to authenticate using sasl (unable to locate a login configuration)14/04/09 16:23:41 info zookeeper.recoverablezookeeper: process identifier=hconnection-0x57c8b24d connecting to zookeeper ensemble=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:218114/04/09 16:23:41 info zookeeper.clientcnxn: socket connection established to slave5.hadoop/192.168.8.205:2181, initiating session14/04/09 16:23:41 info zookeeper.clientcnxn: session establishment complete on server slave5.hadoop/192.168.8.205:2181, sessionid = 0x453fecb6c50009, negotiated timeout = 9000014/04/09 16:23:41 info configuration.deprecation: hadoop.native.lib is deprecated. instead, use io.native.lib.available14/04/09 16:23:41 info zookeeper.zookeeper: initiating client connection, connectstring=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181 sessiontimeout=90000 watcher=catalogtracker-on-hconnection-0x57c8b24d, quorum=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181, baseznode=/hbase14/04/09 16:23:41 info zookeeper.clientcnxn: opening socket connection to server slave7.hadoop/192.168.8.207:2181. will not attempt to authenticate using sasl (unable to locate a login configuration)14/04/09 16:23:41 info zookeeper.recoverablezookeeper: process identifier=catalogtracker-on-hconnection-0x57c8b24d connecting to zookeeper ensemble=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:218114/04/09 16:23:41 info zookeeper.clientcnxn: socket connection established to slave7.hadoop/192.168.8.207:2181, initiating session14/04/09 16:23:41 info zookeeper.clientcnxn: session establishment complete on server slave7.hadoop/192.168.8.207:2181, sessionid = 0x2453fecb6f50008, negotiated timeout = 9000014/04/09 16:23:41 info zookeeper.zookeeper: session: 0x2453fecb6f50008 closed14/04/09 16:23:41 info zookeeper.clientcnxn: eventthread shut down14/04/09 16:23:41 info mapreduce.hbaseimportjob: creating missing hbase table demo_sqoop2hbase14/04/09 16:23:42 info zookeeper.zookeeper: initiating client connection, connectstring=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181 sessiontimeout=90000 watcher=catalogtracker-on-hconnection-0x57c8b24d, quorum=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181, baseznode=/hbase14/04/09 16:23:42 info zookeeper.recoverablezookeeper: process identifier=catalogtracker-on-hconnection-0x57c8b24d connecting to zookeeper ensemble=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:218114/04/09 16:23:42 info zookeeper.clientcnxn: opening socket connection to server slave7.hadoop/192.168.8.207:2181. will not attempt to authenticate using sasl (unable to locate a login configuration)14/04/09 16:23:42 info zookeeper.clientcnxn: socket connection established to slave7.hadoop/192.168.8.207:2181, initiating session14/04/09 16:23:42 info zookeeper.clientcnxn: session establishment complete on server slave7.hadoop/192.168.8.207:2181, sessionid = 0x2453fecb6f50009, negotiated timeout = 9000014/04/09 16:23:42 info zookeeper.zookeeper: session: 0x2453fecb6f50009 closed14/04/09 16:23:42 info zookeeper.clientcnxn: eventthread shut down14/04/09 16:23:42 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 16:23:47 info db.datadrivendbinputformat: boundingvalsquery: select min(`id`), max(`id`) from `demo_blog`14/04/09 16:23:47 info mapreduce.jobsubmitter: number of splits:314/04/09 16:23:47 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 16:23:47 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 16:23:47 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 16:23:47 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 16:23:47 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 16:23:47 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 16:23:47 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 16:23:47 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 16:23:47 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 16:23:47 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 16:23:47 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 16:23:47 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 16:23:47 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 16:23:47 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000514/04/09 16:23:47 info impl.yarnclientimpl: submitted application application_1396936838233_0005 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 16:23:47 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0005/14/04/09 16:23:47 info mapreduce.job: running job: job_1396936838233_000514/04/09 16:23:55 info mapreduce.job: job job_1396936838233_0005 running in uber mode : false14/04/09 16:23:55 info mapreduce.job: map 0% reduce 0%14/04/09 16:24:05 info mapreduce.job: map 33% reduce 0%14/04/09 16:24:12 info mapreduce.job: map 100% reduce 0%14/04/09 16:24:12 info mapreduce.job: job job_1396936838233_0005 completed successfully14/04/09 16:24:12 info mapreduce.job: counters: 27 file system counters file: number of bytes read=0 file: number of bytes written=354636 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=295 hdfs: number of bytes written=0 hdfs: number of read operations=3 hdfs: number of large read operations=0 hdfs: number of write operations=0 job counters launched map tasks=3 other local map tasks=3 total time spent by all maps in occupied slots (ms)=35297 total time spent by all reduces in occupied slots (ms)=0 map-reduce framework map input records=3 map output records=3 input split bytes=295 spilled records=0 failed shuffles=0 merged map outputs=0 gc time elapsed (ms)=381 cpu time spent (ms)=11050 physical memory (bytes) snapshot=543367168 virtual memory (bytes) snapshot=3918925824 total committed heap usage (bytes)=156958720 file input format counters bytes read=0 file output format counters bytes written=014/04/09 16:24:12 info mapreduce.importjobbase: transferred 0 bytes in 29.7126 seconds (0 bytes/sec)14/04/09 16:24:12 info mapreduce.importjobbase: retrieved 3 records.
hbase shell中验证导入的数据:hbase(main):009:0> listtable demo_sqoop2hbase table_02 table_03 test_table xyz 5 row(s) in 0.0310 seconds=> [demo_sqoop2hbase, table_02, table_03, test_table, xyz]hbase(main):010:0> scan demo_sqoop2hbaserow column+cell 1 column=url:blog, timestamp=1397031850700, value=micmiu.com 2 column=url:blog, timestamp=1397031844106, value=ctosun.com 3 column=url:blog, timestamp=1397031849888, value=baby.micmiu.com 3 row(s) in 0.0730 secondshbase(main):011:0> describe demo_sqoop2hbasedescription enabled 'demo_sqoop2hbase', {name => 'url', data_block_encoding => 'none', bl true oomfilter => 'row', replication_scope => '0', versions => '1', compre ssion => 'none', min_versions => '0', ttl => '2147483647', keep_delet ed_cells => 'false', blocksize => '65536', in_memory => 'false', bloc kcache => 'true'} 1 row(s) in 0.0580 secondshbase(main):012:0>
验证导入成功。本文到此已经把mysql中的数据迁移到 hdfs、hive、hbase的三种基本情况演示结束。参考:http://sqoop.apache.org/docs/1.4.4/sqoopuserguide.html—————– ?eof?@michael sun?—————– 原文地址:sqoop安装配置及演示, 感谢原作者分享。