sqoop是一个用来将hadoop(hive、hbase)和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如:mysql ,oracle ,postgres等)中的数据导入到hadoop的hdfs中,也可以将hdfs的数据导入到关系型数据库中。sqoop目前已经是apache的顶级项目了,
            sqoop是一个用来将hadoop(hive、hbase)和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如:mysql ,oracle ,postgres等)中的数据导入到hadoop的hdfs中,也可以将hdfs的数据导入到关系型数据库中。sqoop目前已经是apache的顶级项目了,目前版本是1.4.4 和 sqoop2 1.99.3,本文以1.4.4的版本为例讲解基本的安装配置和简单应用的演示。安装配置	准备测试数据	导入数据到hdfs	导入数据到hive	导入数据到hbase[一]、安装配置选择sqoop 1.4.4 版本:sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz1.1、下载后解压配置:tar -zxvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz /usr/local/cd /usr/localln -s sqoop-1.4.4.bin__hadoop-2.0.4-alpha sqoop
1.2、环境变量配置 vi ~/.bash_profile :#sqoop  add by micmiu.comexport sqoop_home=/usr/local/sqoopexport path=$sqoop_home/bin:$path
1.3、配置sqoop参数:复制/conf/sqoop-env-template.sh 一份重命名为:/conf/sqoop-env.shvi ?/conf/sqoop-env.sh# 指定各环境变量的实际配置# set hadoop-specific environment variables here.#set path to where bin/hadoop is available#export hadoop_common_home=#set path to where hadoop-*-core.jar is available#export hadoop_mapred_home=#set the path to where bin/hbase is available#export hbase_home=#set the path to where bin/hive is available#export hive_home=
ps:因为我当前用户的默认环境变量中已经配置了相关变量,故该配置文件无需再修改:# hadoop  export hadoop_prefix=/usr/local/hadoop  export hadoop_home=${hadoop_prefix}  export path=$path:$hadoop_prefix/bin:$hadoop_prefix/sbinexport hadoop_common_home=${hadoop_prefix}  export hadoop_hdfs_home=${hadoop_prefix}  export hadoop_mapred_home=${hadoop_prefix}export hadoop_yarn_home=${hadoop_prefix}  # native path  export hadoop_common_lib_native_dir=${hadoop_prefix}/lib/native  export hadoop_opts=-djava.library.path=$hadoop_prefix/lib/native # hadoop end#hiveexport hive_home=/usr/local/hiveexport path=$hive_home/bin:$path#hbaseexport hbase_home=/usr/local/hbaseexport path=$hbase#add by micmiu.com
1.4、驱动jar包下面测试演示以mysql为例,则需要把mysql对应的驱动lib文件copy到 /lib 目录下。[二]、测试数据准备以mysql 为例:192.168.6.77(hostname:master.hadoop)	database: test	用户:root 密码:micmiu准备两张测试表一个有主键表demo_blog,一个无主键表 demo_log:create table `demo_blog` (  `id` int(11) not null auto_increment,  `blog` varchar(100) not null,  primary key (`id`)) engine=myisam  default charset=utf8;
create table `demo_log` (  `operator` varchar(16) not null,  `log` varchar(100) not null) engine=myisam  default charset=utf8;
插入测试数据:insert into demo_blog (id, blog) values (1, micmiu.com);insert into demo_blog (id, blog) values (2, ctosun.com);insert into demo_blog (id, blog) values (3, baby.micmiu.com);insert into demo_log (operator, log) values (micmiu, create);insert into demo_log (operator, log) values (micmiu, update);insert into demo_log (operator, log) values (michael, edit);insert into demo_log (operator, log) values (michael, delete);
[三]、导入数据到hdfs3.1、导入有主键的表比如我需要把表 demo_blog (含主键) 的数据导入到hdfs中,执行如下命令:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog
执行过程如下:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blogwarning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 09:58:43 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 09:58:43 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 09:58:43 info tool.codegentool: beginning code generation14/04/09 09:58:43 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 09:58:43 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 09:58:43 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/e8fd26a5bca5b7f51cdb03bf847ce389/demo_blog.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 09:58:44 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/e8fd26a5bca5b7f51cdb03bf847ce389/demo_blog.jar14/04/09 09:58:44 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 09:58:44 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 09:58:44 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 09:58:44 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 09:58:44 info mapreduce.importjobbase: beginning import of demo_blogslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 09:58:44 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 09:58:45 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 09:58:45 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 09:58:47 info db.datadrivendbinputformat: boundingvalsquery: select min(`id`), max(`id`) from `demo_blog`14/04/09 09:58:47 info mapreduce.jobsubmitter: number of splits:314/04/09 09:58:47 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 09:58:47 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 09:58:47 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 09:58:47 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 09:58:47 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 09:58:47 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 09:58:47 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 09:58:47 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 09:58:47 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 09:58:47 info configuration.deprecation: mapred.output.dir is deprecated. instead, use mapreduce.output.fileoutputformat.outputdir14/04/09 09:58:47 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 09:58:47 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 09:58:47 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 09:58:47 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 09:58:47 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000114/04/09 09:58:47 info impl.yarnclientimpl: submitted application application_1396936838233_0001 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 09:58:47 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0001/14/04/09 09:58:47 info mapreduce.job: running job: job_1396936838233_000114/04/09 09:59:00 info mapreduce.job: job job_1396936838233_0001 running in uber mode : false14/04/09 09:59:00 info mapreduce.job:  map 0% reduce 0%14/04/09 09:59:14 info mapreduce.job:  map 33% reduce 0%14/04/09 09:59:16 info mapreduce.job:  map 67% reduce 0%14/04/09 09:59:19 info mapreduce.job:  map 100% reduce 0%14/04/09 09:59:19 info mapreduce.job: job job_1396936838233_0001 completed successfully14/04/09 09:59:19 info mapreduce.job: counters: 27	file system counters		file: number of bytes read=0		file: number of bytes written=271866		file: number of read operations=0		file: number of large read operations=0		file: number of write operations=0		hdfs: number of bytes read=295		hdfs: number of bytes written=44		hdfs: number of read operations=12		hdfs: number of large read operations=0		hdfs: number of write operations=6	job counters 		launched map tasks=3		other local map tasks=3		total time spent by all maps in occupied slots (ms)=43032		total time spent by all reduces in occupied slots (ms)=0	map-reduce framework		map input records=3		map output records=3		input split bytes=295		spilled records=0		failed shuffles=0		merged map outputs=0		gc time elapsed (ms)=590		cpu time spent (ms)=6330		physical memory (bytes) snapshot=440934400		virtual memory (bytes) snapshot=3882573824		total committed heap usage (bytes)=160563200	file input format counters 		bytes read=0	file output format counters 		bytes written=4414/04/09 09:59:19 info mapreduce.importjobbase: transferred 44 bytes in 34.454 seconds (1.2771 bytes/sec)14/04/09 09:59:19 info mapreduce.importjobbase: retrieved 3 records.
验证导入到hdfs上的数据:$ hdfs dfs -ls /user/hadoop/demo_blogfound 4 items-rw-r--r--   3 hadoop supergroup          0 2014-04-09 09:59 /user/hadoop/demo_blog/_success-rw-r--r--   3 hadoop supergroup         13 2014-04-09 09:59 /user/hadoop/demo_blog/part-m-00000-rw-r--r--   3 hadoop supergroup         13 2014-04-09 09:59 /user/hadoop/demo_blog/part-m-00001-rw-r--r--   3 hadoop supergroup         18 2014-04-09 09:59 /user/hadoop/demo_blog/part-m-00002[hadoop@master ~]$ hdfs dfs -cat /user/hadoop/demo_blog/part-m-0000*1,micmiu.com2,ctosun.com3,baby.micmiu.com
ps:默认设置下导入到hdfs上的路径是:?/user/username/tablename/(files),比如我的当前用户是hadoop,那么实际路径即:?/user/hadoop/demo_blog/(files)。如果要自定义路径需要增加参数:--warehouse-dir 比如:sqoop import --connect jdbc:mysql://master.hadoop/test --username root --password micmiu --table demo_blog --warehouse-dir /user/micmiu/sqoop
3.2、导入不含主键的表比如需要把表 demo_log(无主键) 的数据导入到hdfs中,执行如下命令:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_log --warehouse-dir /user/micmiu/sqoop --split-by operator
ps:无主键表的导入需要增加参数? --split-by xxx ?或者  -m 1 执行过程:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_log --warehouse-dir /user/micmiu/sqoop --split-by operatorwarning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 15:02:06 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 15:02:06 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 15:02:06 info tool.codegentool: beginning code generation14/04/09 15:02:06 info manager.sqlmanager: executing sql statement: select t.* from `demo_log` as t limit 114/04/09 15:02:06 info manager.sqlmanager: executing sql statement: select t.* from `demo_log` as t limit 114/04/09 15:02:06 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/dddc1bcdba30515f95a2d604f22e4fe9/demo_log.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 15:02:07 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/dddc1bcdba30515f95a2d604f22e4fe9/demo_log.jar14/04/09 15:02:07 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 15:02:07 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 15:02:07 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 15:02:07 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 15:02:07 info mapreduce.importjobbase: beginning import of demo_logslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 15:02:07 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 15:02:08 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 15:02:08 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 15:02:10 info db.datadrivendbinputformat: boundingvalsquery: select min(`operator`), max(`operator`) from `demo_log`14/04/09 15:02:10 warn db.textsplitter: generating splits for a textual index column.14/04/09 15:02:10 warn db.textsplitter: if your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.14/04/09 15:02:10 warn db.textsplitter: you are strongly encouraged to choose an integral split column.14/04/09 15:02:10 info mapreduce.jobsubmitter: number of splits:414/04/09 15:02:10 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 15:02:10 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 15:02:10 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 15:02:10 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 15:02:10 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 15:02:10 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 15:02:10 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 15:02:10 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 15:02:10 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 15:02:10 info configuration.deprecation: mapred.output.dir is deprecated. instead, use mapreduce.output.fileoutputformat.outputdir14/04/09 15:02:10 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 15:02:10 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 15:02:10 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 15:02:10 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 15:02:10 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000314/04/09 15:02:10 info impl.yarnclientimpl: submitted application application_1396936838233_0003 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 15:02:10 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0003/14/04/09 15:02:10 info mapreduce.job: running job: job_1396936838233_000314/04/09 15:02:17 info mapreduce.job: job job_1396936838233_0003 running in uber mode : false14/04/09 15:02:17 info mapreduce.job:  map 0% reduce 0%14/04/09 15:02:28 info mapreduce.job:  map 25% reduce 0%14/04/09 15:02:30 info mapreduce.job:  map 50% reduce 0%14/04/09 15:02:33 info mapreduce.job:  map 100% reduce 0%14/04/09 15:02:33 info mapreduce.job: job job_1396936838233_0003 completed successfully14/04/09 15:02:33 info mapreduce.job: counters: 27	file system counters		file: number of bytes read=0		file: number of bytes written=362536		file: number of read operations=0		file: number of large read operations=0		file: number of write operations=0		hdfs: number of bytes read=516		hdfs: number of bytes written=56		hdfs: number of read operations=16		hdfs: number of large read operations=0		hdfs: number of write operations=8	job counters 		launched map tasks=4		other local map tasks=4		total time spent by all maps in occupied slots (ms)=44481		total time spent by all reduces in occupied slots (ms)=0	map-reduce framework		map input records=4		map output records=4		input split bytes=516		spilled records=0		failed shuffles=0		merged map outputs=0		gc time elapsed (ms)=429		cpu time spent (ms)=6650		physical memory (bytes) snapshot=587669504		virtual memory (bytes) snapshot=5219356672		total committed heap usage (bytes)=205848576	file input format counters 		bytes read=0	file output format counters 		bytes written=5614/04/09 15:02:33 info mapreduce.importjobbase: transferred 56 bytes in 25.2746 seconds (2.2157 bytes/sec)14/04/09 15:02:33 info mapreduce.importjobbase: retrieved 4 records.
验证导入的数据:$ hdfs dfs -ls /user/micmiu/sqoop/demo_logfound 5 items-rw-r--r--   3 hadoop supergroup          0 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/_success-rw-r--r--   3 hadoop supergroup         28 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00000-rw-r--r--   3 hadoop supergroup          0 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00001-rw-r--r--   3 hadoop supergroup          0 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00002-rw-r--r--   3 hadoop supergroup         28 2014-04-09 15:02 /user/micmiu/sqoop/demo_log/part-m-00003$ hdfs dfs -cat /user/micmiu/sqoop/demo_log/part-m-0000*michael,editmichael,deletemicmiu,createmicmiu,update
[四]、导入数据到hive比如把表demo_blog 数据导入到hive中,增加参数 --hive-import?:sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog  --warehouse-dir /user/sqoop --hive-import --create-hive-table
执行过程如下:$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog  --warehouse-dir /user/sqoop --hive-import --create-hive-table warning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 10:44:21 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 10:44:21 info tool.basesqooptool: using hive-specific delimiters for output. you can override14/04/09 10:44:21 info tool.basesqooptool: delimiters with --fields-terminated-by, etc.14/04/09 10:44:21 warn tool.basesqooptool: it seems that you've specified at least one of following:14/04/09 10:44:21 warn tool.basesqooptool: 	--hive-home14/04/09 10:44:21 warn tool.basesqooptool: 	--hive-overwrite14/04/09 10:44:21 warn tool.basesqooptool: 	--create-hive-table14/04/09 10:44:21 warn tool.basesqooptool: 	--hive-table14/04/09 10:44:21 warn tool.basesqooptool: 	--hive-partition-key14/04/09 10:44:21 warn tool.basesqooptool: 	--hive-partition-value14/04/09 10:44:21 warn tool.basesqooptool: 	--map-column-hive14/04/09 10:44:21 warn tool.basesqooptool: without specifying parameter --hive-import. please note that14/04/09 10:44:21 warn tool.basesqooptool: those arguments will not be used in this session. either14/04/09 10:44:21 warn tool.basesqooptool: specify --hive-import to apply them correctly or remove them14/04/09 10:44:21 warn tool.basesqooptool: from command line to remove this warning.14/04/09 10:44:21 info tool.basesqooptool: please note that --hive-home, --hive-partition-key, 14/04/09 10:44:21 info tool.basesqooptool: 	 hive-partition-value and --map-column-hive options are 14/04/09 10:44:21 info tool.basesqooptool: 	 are also valid for hcatalog imports and exports14/04/09 10:44:21 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 10:44:21 info tool.codegentool: beginning code generation14/04/09 10:44:21 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 10:44:21 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 10:44:21 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/c071f02ecad006293202fd2c2fad0dce/demo_blog.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 10:44:22 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/c071f02ecad006293202fd2c2fad0dce/demo_blog.jar14/04/09 10:44:22 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 10:44:22 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 10:44:22 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 10:44:22 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 10:44:22 info mapreduce.importjobbase: beginning import of demo_blogslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 10:44:22 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 10:44:23 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 10:44:23 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 10:44:25 info db.datadrivendbinputformat: boundingvalsquery: select min(`id`), max(`id`) from `demo_blog`14/04/09 10:44:25 info mapreduce.jobsubmitter: number of splits:314/04/09 10:44:25 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 10:44:25 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 10:44:25 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 10:44:25 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 10:44:25 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 10:44:25 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 10:44:25 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 10:44:25 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 10:44:25 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 10:44:25 info configuration.deprecation: mapred.output.dir is deprecated. instead, use mapreduce.output.fileoutputformat.outputdir14/04/09 10:44:25 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 10:44:25 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 10:44:25 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 10:44:25 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 10:44:25 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000214/04/09 10:44:25 info impl.yarnclientimpl: submitted application application_1396936838233_0002 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 10:44:25 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0002/14/04/09 10:44:25 info mapreduce.job: running job: job_1396936838233_000214/04/09 10:44:33 info mapreduce.job: job job_1396936838233_0002 running in uber mode : false14/04/09 10:44:33 info mapreduce.job:  map 0% reduce 0%14/04/09 10:44:46 info mapreduce.job:  map 67% reduce 0%14/04/09 10:44:48 info mapreduce.job:  map 100% reduce 0%14/04/09 10:44:49 info mapreduce.job: job job_1396936838233_0002 completed successfully14/04/09 10:44:49 info mapreduce.job: counters: 27	file system counters		file: number of bytes read=0		file: number of bytes written=271860		file: number of read operations=0		file: number of large read operations=0		file: number of write operations=0		hdfs: number of bytes read=295		hdfs: number of bytes written=44		hdfs: number of read operations=12		hdfs: number of large read operations=0		hdfs: number of write operations=6	job counters 		launched map tasks=3		other local map tasks=3		total time spent by all maps in occupied slots (ms)=34047		total time spent by all reduces in occupied slots (ms)=0	map-reduce framework		map input records=3		map output records=3		input split bytes=295		spilled records=0		failed shuffles=0		merged map outputs=0		gc time elapsed (ms)=505		cpu time spent (ms)=5350		physical memory (bytes) snapshot=427388928		virtual memory (bytes) snapshot=3881439232		total committed heap usage (bytes)=171638784	file input format counters 		bytes read=0	file output format counters 		bytes written=4414/04/09 10:44:49 info mapreduce.importjobbase: transferred 44 bytes in 26.0401 seconds (1.6897 bytes/sec)14/04/09 10:44:49 info mapreduce.importjobbase: retrieved 3 records.14/04/09 10:44:49 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 10:44:49 info hive.hiveimport: loading uploaded data into hive14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.min.split.size is deprecated. instead, use mapreduce.input.fileinputformat.split.minsize14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. instead, use mapreduce.reduce.speculative14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.min.split.size.per.node is deprecated. instead, use mapreduce.input.fileinputformat.split.minsize.per.node14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.input.dir.recursive is deprecated. instead, use mapreduce.input.fileinputformat.input.dir.recursive14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.min.split.size.per.rack is deprecated. instead, use mapreduce.input.fileinputformat.split.minsize.per.rack14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.max.split.size is deprecated. instead, use mapreduce.input.fileinputformat.split.maxsize14/04/09 10:44:52 info hive.hiveimport: 14/04/09 10:44:52 info configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. instead, use mapreduce.job.committer.setup.cleanup.needed14/04/09 10:44:53 info hive.hiveimport: 14/04/09 10:44:53 warn conf.hiveconf: deprecated: hive.metastore.ds.retry.* no longer has any effect.  use hive.hmshandler.retry.* instead14/04/09 10:44:53 info hive.hiveimport: 14/04/09 10:44:53 info hive.hiveimport: logging initialized using configuration in file:/usr/local/hive-0.13.0-bin/conf/hive-log4j.properties14/04/09 10:44:53 info hive.hiveimport: slf4j: class path contains multiple slf4j bindings.14/04/09 10:44:53 info hive.hiveimport: slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]14/04/09 10:44:53 info hive.hiveimport: slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]14/04/09 10:44:53 info hive.hiveimport: slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.14/04/09 10:44:53 info hive.hiveimport: slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 10:44:57 info hive.hiveimport: ok14/04/09 10:44:57 info hive.hiveimport: time taken: 0.773 seconds14/04/09 10:44:57 info hive.hiveimport: loading data to table default.demo_blog14/04/09 10:44:57 info hive.hiveimport: table default.demo_blog stats: [numfiles=4, numrows=0, totalsize=44, rawdatasize=0]14/04/09 10:44:57 info hive.hiveimport: ok14/04/09 10:44:57 info hive.hiveimport: time taken: 0.25 seconds14/04/09 10:44:57 info hive.hiveimport: hive import complete.14/04/09 10:44:57 info hive.hiveimport: export directory is empty, removing it
hive cli中验证导入的数据:hive> show tables;okdemo_bloghbase_table_1hbase_table_2hbase_table_3micmiu_blogmicmiu_hx_masterpokesxflow_dstiptime taken: 0.073 seconds, fetched: 8 row(s)hive> select * from demo_blog;ok1	micmiu.com2	ctosun.com3	baby.micmiu.comtime taken: 0.506 seconds, fetched: 3 row(s)
[五]、导入数据到hbase演示把表 demo_blog 数据导入到hbase ,指定hbase中表名为 demo_sqoop2hbase 的命令:sqoop  import  --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --hbase-table demo_sqoop2hbase --hbase-create-table --hbase-row-key id --column-family url
执行过程:$ sqoop  import  --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --hbase-table demo_sqoop2hbase --hbase-create-table --hbase-row-key id --column-family urlwarning: /usr/lib/hcatalog does not exist! hcatalog jobs will fail.please set $hcat_home to the root of your hcatalog installation.14/04/09 16:23:38 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.14/04/09 16:23:38 info manager.mysqlmanager: preparing to use a mysql streaming resultset.14/04/09 16:23:38 info tool.codegentool: beginning code generation14/04/09 16:23:39 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 16:23:39 info manager.sqlmanager: executing sql statement: select t.* from `demo_blog` as t limit 114/04/09 16:23:39 info orm.compilationmanager: hadoop_mapred_home is /usr/local/hadoopnote: /tmp/sqoop-hadoop/compile/85408c854ee8fba75bbb2458e5e25093/demo_blog.java uses or overrides a deprecated api.note: recompile with -xlint:deprecation for details.14/04/09 16:23:40 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/85408c854ee8fba75bbb2458e5e25093/demo_blog.jar14/04/09 16:23:40 warn manager.mysqlmanager: it looks like you are importing from mysql.14/04/09 16:23:40 warn manager.mysqlmanager: this transfer can be faster! use the --direct14/04/09 16:23:40 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.14/04/09 16:23:40 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)14/04/09 16:23:40 info mapreduce.importjobbase: beginning import of demo_blogslf4j: class path contains multiple slf4j bindings.slf4j: found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.slf4j: actual binding is of type [org.slf4j.impl.log4jloggerfactory]14/04/09 16:23:40 info configuration.deprecation: mapred.jar is deprecated. instead, use mapreduce.job.jar14/04/09 16:23:40 info configuration.deprecation: mapred.map.tasks is deprecated. instead, use mapreduce.job.maps14/04/09 16:23:41 info zookeeper.zookeeper: client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 gmt14/04/09 16:23:41 info zookeeper.zookeeper: client environment:host.name=master.hadoop14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.version=1.6.0_2014/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.vendor=sun microsystems inc.14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.home=/java/jdk1.6.0_20/jre14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.class.path=/usr/local/hadoop/etc/hadoop: .......14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.library.path=/usr/local/hadoop/lib/native14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.io.tmpdir=/tmp14/04/09 16:23:41 info zookeeper.zookeeper: client environment:java.compiler=14/04/09 16:23:41 info zookeeper.zookeeper: client environment:os.name=linux14/04/09 16:23:41 info zookeeper.zookeeper: client environment:os.arch=amd6414/04/09 16:23:41 info zookeeper.zookeeper: client environment:os.version=2.6.32-71.el6.x86_6414/04/09 16:23:41 info zookeeper.zookeeper: client environment:user.name=hadoop14/04/09 16:23:41 info zookeeper.zookeeper: client environment:user.home=/home/hadoop14/04/09 16:23:41 info zookeeper.zookeeper: client environment:user.dir=/home/hadoop14/04/09 16:23:41 info zookeeper.zookeeper: initiating client connection, connectstring=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181 sessiontimeout=90000 watcher=hconnection-0x57c8b24d, quorum=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181, baseznode=/hbase14/04/09 16:23:41 info zookeeper.clientcnxn: opening socket connection to server slave5.hadoop/192.168.8.205:2181. will not attempt to authenticate using sasl (unable to locate a login configuration)14/04/09 16:23:41 info zookeeper.recoverablezookeeper: process identifier=hconnection-0x57c8b24d connecting to zookeeper ensemble=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:218114/04/09 16:23:41 info zookeeper.clientcnxn: socket connection established to slave5.hadoop/192.168.8.205:2181, initiating session14/04/09 16:23:41 info zookeeper.clientcnxn: session establishment complete on server slave5.hadoop/192.168.8.205:2181, sessionid = 0x453fecb6c50009, negotiated timeout = 9000014/04/09 16:23:41 info configuration.deprecation: hadoop.native.lib is deprecated. instead, use io.native.lib.available14/04/09 16:23:41 info zookeeper.zookeeper: initiating client connection, connectstring=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181 sessiontimeout=90000 watcher=catalogtracker-on-hconnection-0x57c8b24d, quorum=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181, baseznode=/hbase14/04/09 16:23:41 info zookeeper.clientcnxn: opening socket connection to server slave7.hadoop/192.168.8.207:2181. will not attempt to authenticate using sasl (unable to locate a login configuration)14/04/09 16:23:41 info zookeeper.recoverablezookeeper: process identifier=catalogtracker-on-hconnection-0x57c8b24d connecting to zookeeper ensemble=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:218114/04/09 16:23:41 info zookeeper.clientcnxn: socket connection established to slave7.hadoop/192.168.8.207:2181, initiating session14/04/09 16:23:41 info zookeeper.clientcnxn: session establishment complete on server slave7.hadoop/192.168.8.207:2181, sessionid = 0x2453fecb6f50008, negotiated timeout = 9000014/04/09 16:23:41 info zookeeper.zookeeper: session: 0x2453fecb6f50008 closed14/04/09 16:23:41 info zookeeper.clientcnxn: eventthread shut down14/04/09 16:23:41 info mapreduce.hbaseimportjob: creating missing hbase table demo_sqoop2hbase14/04/09 16:23:42 info zookeeper.zookeeper: initiating client connection, connectstring=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181 sessiontimeout=90000 watcher=catalogtracker-on-hconnection-0x57c8b24d, quorum=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:2181, baseznode=/hbase14/04/09 16:23:42 info zookeeper.recoverablezookeeper: process identifier=catalogtracker-on-hconnection-0x57c8b24d connecting to zookeeper ensemble=slave6.hadoop:2181,slave5.hadoop:2181,slave7.hadoop:218114/04/09 16:23:42 info zookeeper.clientcnxn: opening socket connection to server slave7.hadoop/192.168.8.207:2181. will not attempt to authenticate using sasl (unable to locate a login configuration)14/04/09 16:23:42 info zookeeper.clientcnxn: socket connection established to slave7.hadoop/192.168.8.207:2181, initiating session14/04/09 16:23:42 info zookeeper.clientcnxn: session establishment complete on server slave7.hadoop/192.168.8.207:2181, sessionid = 0x2453fecb6f50009, negotiated timeout = 9000014/04/09 16:23:42 info zookeeper.zookeeper: session: 0x2453fecb6f50009 closed14/04/09 16:23:42 info zookeeper.clientcnxn: eventthread shut down14/04/09 16:23:42 info client.rmproxy: connecting to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 16:23:47 info db.datadrivendbinputformat: boundingvalsquery: select min(`id`), max(`id`) from `demo_blog`14/04/09 16:23:47 info mapreduce.jobsubmitter: number of splits:314/04/09 16:23:47 info configuration.deprecation: mapred.job.classpath.files is deprecated. instead, use mapreduce.job.classpath.files14/04/09 16:23:47 info configuration.deprecation: user.name is deprecated. instead, use mapreduce.job.user.name14/04/09 16:23:47 info configuration.deprecation: mapred.cache.files.filesizes is deprecated. instead, use mapreduce.job.cache.files.filesizes14/04/09 16:23:47 info configuration.deprecation: mapred.cache.files is deprecated. instead, use mapreduce.job.cache.files14/04/09 16:23:47 info configuration.deprecation: mapred.reduce.tasks is deprecated. instead, use mapreduce.job.reduces14/04/09 16:23:47 info configuration.deprecation: mapred.output.value.class is deprecated. instead, use mapreduce.job.output.value.class14/04/09 16:23:47 info configuration.deprecation: mapreduce.map.class is deprecated. instead, use mapreduce.job.map.class14/04/09 16:23:47 info configuration.deprecation: mapred.job.name is deprecated. instead, use mapreduce.job.name14/04/09 16:23:47 info configuration.deprecation: mapreduce.inputformat.class is deprecated. instead, use mapreduce.job.inputformat.class14/04/09 16:23:47 info configuration.deprecation: mapreduce.outputformat.class is deprecated. instead, use mapreduce.job.outputformat.class14/04/09 16:23:47 info configuration.deprecation: mapred.cache.files.timestamps is deprecated. instead, use mapreduce.job.cache.files.timestamps14/04/09 16:23:47 info configuration.deprecation: mapred.output.key.class is deprecated. instead, use mapreduce.job.output.key.class14/04/09 16:23:47 info configuration.deprecation: mapred.working.dir is deprecated. instead, use mapreduce.job.working.dir14/04/09 16:23:47 info mapreduce.jobsubmitter: submitting tokens for job: job_1396936838233_000514/04/09 16:23:47 info impl.yarnclientimpl: submitted application application_1396936838233_0005 to resourcemanager at master.hadoop/192.168.6.77:803214/04/09 16:23:47 info mapreduce.job: the url to track the job: http://master.hadoop:8088/proxy/application_1396936838233_0005/14/04/09 16:23:47 info mapreduce.job: running job: job_1396936838233_000514/04/09 16:23:55 info mapreduce.job: job job_1396936838233_0005 running in uber mode : false14/04/09 16:23:55 info mapreduce.job:  map 0% reduce 0%14/04/09 16:24:05 info mapreduce.job:  map 33% reduce 0%14/04/09 16:24:12 info mapreduce.job:  map 100% reduce 0%14/04/09 16:24:12 info mapreduce.job: job job_1396936838233_0005 completed successfully14/04/09 16:24:12 info mapreduce.job: counters: 27	file system counters		file: number of bytes read=0		file: number of bytes written=354636		file: number of read operations=0		file: number of large read operations=0		file: number of write operations=0		hdfs: number of bytes read=295		hdfs: number of bytes written=0		hdfs: number of read operations=3		hdfs: number of large read operations=0		hdfs: number of write operations=0	job counters 		launched map tasks=3		other local map tasks=3		total time spent by all maps in occupied slots (ms)=35297		total time spent by all reduces in occupied slots (ms)=0	map-reduce framework		map input records=3		map output records=3		input split bytes=295		spilled records=0		failed shuffles=0		merged map outputs=0		gc time elapsed (ms)=381		cpu time spent (ms)=11050		physical memory (bytes) snapshot=543367168		virtual memory (bytes) snapshot=3918925824		total committed heap usage (bytes)=156958720	file input format counters 		bytes read=0	file output format counters 		bytes written=014/04/09 16:24:12 info mapreduce.importjobbase: transferred 0 bytes in 29.7126 seconds (0 bytes/sec)14/04/09 16:24:12 info mapreduce.importjobbase: retrieved 3 records.
hbase shell中验证导入的数据:hbase(main):009:0> listtable                                                                                                       demo_sqoop2hbase                                                                                            table_02                                                                                                    table_03                                                                                                    test_table                                                                                                  xyz                                                                                                         5 row(s) in 0.0310 seconds=> [demo_sqoop2hbase, table_02, table_03, test_table, xyz]hbase(main):010:0> scan demo_sqoop2hbaserow                          column+cell                                                                     1                           column=url:blog, timestamp=1397031850700, value=micmiu.com                      2                           column=url:blog, timestamp=1397031844106, value=ctosun.com                      3                           column=url:blog, timestamp=1397031849888, value=baby.micmiu.com                3 row(s) in 0.0730 secondshbase(main):011:0> describe demo_sqoop2hbasedescription                                                            enabled                               'demo_sqoop2hbase', {name => 'url', data_block_encoding => 'none', bl true                                  oomfilter => 'row', replication_scope => '0', versions => '1', compre                                       ssion => 'none', min_versions => '0', ttl => '2147483647', keep_delet                                       ed_cells => 'false', blocksize => '65536', in_memory => 'false', bloc                                       kcache => 'true'}                                                                                          1 row(s) in 0.0580 secondshbase(main):012:0>
验证导入成功。本文到此已经把mysql中的数据迁移到 hdfs、hive、hbase的三种基本情况演示结束。参考:http://sqoop.apache.org/docs/1.4.4/sqoopuserguide.html—————– ?eof?@michael sun?—————–            原文地址:sqoop安装配置及演示, 感谢原作者分享。
   
 
   