感谢@吴飚飚哥的细心指导,向飚哥精益求精的精神学习sqoop:apache基金会下一个开源产品,hadoop家族的一个产品,关系型数据库与hdfs文件系统之间进行数据交换,
感谢@吴飚 飚哥的细心指导,向飚哥精益求精的精神学习
sqoop:apache基金会下一个开源产品,hadoop家族的一个产品,关系型数据库与hdfs文件系统之间进行数据交换,数据迁移的一个工具。
一、环境描述
mysql版本:mysql-installer-community-5.5.27.1 32位
mysql for windows 7 32位:我把mysql数据库安装在了自己win7的笔记本上,这样的好处就是减少了虚拟机 master slave的开销和使用空间还可以多利用一台机器的资源,如果你的虚拟机资源很紧张的话也可以这样部署。
linux iso:centos-6.0-i386-bin-dvd.iso 32位
jdk version:1.6.0_25-ea for linux
hadoop software version:hadoop-0.20.205.0.tar.gz for linux
mysql version:mysql-installer-community-5.5.27.1 32位 for windows
sqoop version:sqoop-1.2.0-cdh3b4.tar.gz for linux
主机名
ip
节点名
备注
h1
192.168.2.102
master
namenode和jobtracker
h2
192.168.2.103
slave1
datanode和tasktracker
h4
192.168.2.105
slave2
datanode和tasktracker
mysql部署在宿主环境中: 参考飚哥风靡版
二、下载软件安装包
帖子名:hadoop第十周cloudera版sqoop包和hadoop-core-jar包下载
帖子网址:?mod=viewthread&tid=36867&fromuid=303
欢迎大家下载使用
sqoop-1.2.0-cdh3b4.tar.gz 和 hadoop-core-jar包 mysql-connector-java-5.1.22-bin.jar 包 是我们这次用到的
三、把下载好的文件加载到linux并解压
下载
[grid@h1 ~]$ pwd
/home/grid/
-rwxrw-rw-. 1 grid hadoop 67339212 4月 12 2011 hadoop-0.20.2-cdh3b4.tar.gz
-rwxrw-rw-. 1 grid hadoop 832960 11月 19 16:06 mysql-connector-java-5.1.22-bin.jar
-rwxrw-rw-. 1 grid hadoop 1543137 4月 12 2011 sqoop-1.2.0-cdh3b4.tar.gz
解压包
[grid@h1 ~]$ tar -zxvf hadoop-0.20.2-cdh3b4.tar.gz
[grid@h1 ~]$ tar -zxvf sqoop-1.2.0-cdh3b4.tar.gz
[grid@h1 ~]$ pwd
/home/grid/
drwxr-xr-x. 15 grid hadoop 4096 2月 22 2011 hadoop-0.20.2-cdh3b4 解压后目录
-rwxrw-rw-. 1 grid hadoop 67339212 4月 12 2011 hadoop-0.20.2-cdh3b4.tar.gz
-rwxrw-rw-. 1 grid hadoop 832960 11月 19 16:06 mysql-connector-java-5.1.22-bin.jar
drwxr-xr-x. 11 grid hadoop 4096 2月 22 2011 sqoop-1.2.0-cdh3b4 解压后目录
-rwxrw-rw-. 1 grid hadoop 1543137 4月 12 2011 sqoop-1.2.0-cdh3b4.tar.gz
四、拷贝hadoop-core-0.20.2-cdh3b4.jar和mysql-connector-java-5.1.22-bin.jar到/home/grid/sqoop-1.2.0-cdh3b4/lib/目录下
[grid@h1 ~]$ cd hadoop-0.20.2-cdh3b4
[grid@h1 hadoop-0.20.2-cdh3b4]$ cp hadoop-core-0.20.2-cdh3b4.jar /home/grid/sqoop-1.2.0-cdh3b4/lib/
[grid@h1 grid]$ cp mysql-connector-java-5.1.22-bin.jar /home/grid/sqoop-1.2.0-cdh3b4/lib/
五、配置sqoop-1.2.0-cdh3b4/bin/configure-sqoop文件
[grid@h1 conf]$ cd ../bin
[grid@h1 bin]$ pwd
/home/grid/sqoop-1.2.0-cdh3b4/bin
[grid@h1 bin]$ vim configure-sqoop
注释掉hbase和zookeeper检查(除非你准备使用habase等hadoop上的组件)
# check: if we can't find our dependencies, give up here.
if [ ! -d ${hadoop_home} ]; then
echo error: $hadoop_home does not exist!
echo 'please set $hadoop_home to the root of your hadoop installation.'
exit 1
fi 只有红色需要修改
#if [ ! -d ${hbase_home} ]; then
#echo error: $hbase_home does not exist!
#echo 'please set $hbase_home to the root of your hbase installation.'
#exit 1
#fi
#if [ ! -d ${zookeeper_home} ]; then
# echo error: $zookeeper_home does not exist!
# echo 'please set $zookeeper_home to the root of your zookeeper installation.'
# exit 1
#fi
六、配置所需环境变量
在哪里执行sqoop,就在哪台机器上设置一下
[grid@h1 grid]$ vim .bashrc 添加
export java_home=/usr
export jre_home=/usr/java/jdk1.6.0_25/jre
export path=/usr/java/jdk1.6.0_25/bin:/home/grid/hadoop-0.20.2/bin:/home/grid/pig-0.9.2/bin:$path
export classpath=./:/usr/java/jdk1.6.0_25/lib:/usr/java/jdk1.6.0_25/jre/lib
export pig_classpath=/home/grid/hadoop-0.20.2/conf
export hive_home=/home/grid/hive-0.8.1
export hive_conf_dir=$hive_home/conf
export hadoop_home=/home/grid/hadoop-0.20.2
作用:让sqoop程序从环境变量里找到hadoop的位置,从而找到hadoop配置文件,知道集群的部署情况
[grid@h1 grid]$ echo $hadoop_home 检查一下没有问题
/home/grid/hadoop-0.20.2
七、配置启动hadoop集群
h1机器 master
[grid@h1 bin]$ pwd
/home/grid/hadoop-0.20.2/bin
[grid@h1 bin]$ ./start-all.sh
starting namenode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-h1.out
h2: starting datanode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h2.out
h4: starting datanode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h4.out
h1: starting secondarynamenode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-h1.out
starting jobtracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-h1.out
h2: starting tasktracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h2.out
h4: starting tasktracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h4.out
[grid@h1 bin]$ jps
17191 jobtracker
16955 namenode
17442 jps
17121 secondarynamenode
h2机器 slave
[grid@h2 ~]$ jps
32523 jps
17188 tasktracker
13727 hquorumpeer
17077 datanode
h4机器 slave
[grid@h4 ~]$ jps
27829 tasktracker
26875 jps
17119 datanode
31083 jps
11557 hquorumpeer
[grid@h1 bin]$ ./hadoop dfsadmin –report 检查hadoop集群状态
configured capacity: 19865944064 (18.5 gb)
present capacity: 8741523456 (8.14 gb)
dfs remaining: 8726482944 (8.13 gb)
dfs used: 15040512 (14.34 mb)
dfs used%: 0.17%
under replicated blocks: 4
blocks with corrupt replicas: 0
missing blocks: 0
-------------------------------------------------
datanodes available: 2 (2 total, 0 dead) --2个节点存活无shutdown
name: 192.168.2.103:50010 -- slaves h2
decommission status : normal --状态正常
configured capacity: 9932972032 (9.25 gb)
dfs used: 7520256 (7.17 mb)
non dfs used: 5447561216 (5.07 gb)
dfs remaining: 4477890560(4.17 gb)
dfs used%: 0.08%
dfs remaining%: 45.08%
last contact: fri dec 14 18:10:11 cst 2012
name: 192.168.2.105:50010 -- slaves h4
decommission status : normal --状态正常
configured capacity: 9932972032 (9.25 gb)
dfs used: 7520256 (7.17 mb)
non dfs used: 5676859392 (5.29 gb)
dfs remaining: 4248592384(3.96 gb)
dfs used%: 0.08%
dfs remaining%: 42.77%
last contact: fri dec 14 18:10:11 cst 2012
集群正常启动了
八、启动mysql,创建leo用户进行sqoop连接
1. 必须启动服务才能操作数据库
数据库端口:3306
mysqll服务名:mysql55
mysql状态:已经启动
创建leo用户
grant all privileges on *.* to 'leo'@'%' identified by 'leo' with grant option;
select * from mysql.user;
flush privileges;
九、mysql 中建立sqoop库,网站空间,test表,添加数据
[grid@h1 bin]$ ping 192.168.2.110 检查linux for windows 的连接性
ping 192.168.2.110 (192.168.2.110) 56(84) bytes of data.
64 bytes from 192.168.2.110: icmp_seq=1 ttl=64 time=14.5 ms
64 bytes from 192.168.2.110: icmp_seq=2 ttl=64 time=3.43 ms
64 bytes from 192.168.2.110: icmp_seq=3 ttl=64 time=9.68 ms
64 bytes from 192.168.2.110: icmp_seq=4 ttl=64 time=0.549 ms
^c
--- 192.168.2.110 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3630ms
rtt min/avg/max/mdev = 0.549/7.063/14.577/5.453 ms
[grid@h1 grid]$ mysql -h192.168.2.110 -uleo –pleo 使用leo用户登录数据库
命令列表
show databases; 显示当前有哪些数据库
create database sqoop; 创建sqoop数据库