您好,欢迎访问一九零五行业门户网

Mysql-sqoop-HDFS 数据交换实验

感谢@吴飚飚哥的细心指导,向飚哥精益求精的精神学习sqoop:apache基金会下一个开源产品,hadoop家族的一个产品,关系型数据库与hdfs文件系统之间进行数据交换,
感谢@吴飚  飚哥的细心指导,向飚哥精益求精的精神学习
sqoop:apache基金会下一个开源产品,hadoop家族的一个产品,关系型数据库与hdfs文件系统之间进行数据交换,数据迁移的一个工具。
一、环境描述
mysql版本:mysql-installer-community-5.5.27.1   32位
mysql  for  windows 7  32位:我把mysql数据库安装在了自己win7的笔记本上,这样的好处就是减少了虚拟机 master  slave的开销和使用空间还可以多利用一台机器的资源,如果你的虚拟机资源很紧张的话也可以这样部署。
linux iso:centos-6.0-i386-bin-dvd.iso 32位   
jdk version:1.6.0_25-ea   for  linux
hadoop software version:hadoop-0.20.205.0.tar.gz   for  linux
mysql  version:mysql-installer-community-5.5.27.1   32位 for windows
sqoop version:sqoop-1.2.0-cdh3b4.tar.gz   for  linux
主机名
ip
节点名
备注
h1
192.168.2.102
master
namenode和jobtracker
h2
192.168.2.103
slave1
datanode和tasktracker
h4
192.168.2.105
slave2
datanode和tasktracker
mysql部署在宿主环境中:  参考飚哥风靡版
二、下载软件安装包
帖子名:hadoop第十周cloudera版sqoop包和hadoop-core-jar包下载
帖子网址:?mod=viewthread&tid=36867&fromuid=303
欢迎大家下载使用
sqoop-1.2.0-cdh3b4.tar.gz 和 hadoop-core-jar包  mysql-connector-java-5.1.22-bin.jar 包 是我们这次用到的
三、把下载好的文件加载到linux并解压
下载
[grid@h1 ~]$ pwd
/home/grid/
-rwxrw-rw-.  1 grid hadoop 67339212  4月 12 2011 hadoop-0.20.2-cdh3b4.tar.gz
-rwxrw-rw-.  1 grid hadoop   832960 11月 19 16:06 mysql-connector-java-5.1.22-bin.jar
-rwxrw-rw-.  1 grid hadoop  1543137  4月 12 2011 sqoop-1.2.0-cdh3b4.tar.gz
解压包
[grid@h1 ~]$ tar -zxvf hadoop-0.20.2-cdh3b4.tar.gz
[grid@h1 ~]$ tar -zxvf sqoop-1.2.0-cdh3b4.tar.gz
[grid@h1 ~]$ pwd
/home/grid/
drwxr-xr-x. 15 grid hadoop     4096  2月 22 2011 hadoop-0.20.2-cdh3b4      解压后目录
-rwxrw-rw-.  1 grid hadoop 67339212  4月 12 2011 hadoop-0.20.2-cdh3b4.tar.gz
-rwxrw-rw-.  1 grid hadoop   832960 11月 19 16:06 mysql-connector-java-5.1.22-bin.jar
drwxr-xr-x. 11 grid hadoop     4096  2月 22 2011 sqoop-1.2.0-cdh3b4          解压后目录
-rwxrw-rw-.  1 grid hadoop  1543137  4月 12 2011 sqoop-1.2.0-cdh3b4.tar.gz
四、拷贝hadoop-core-0.20.2-cdh3b4.jar和mysql-connector-java-5.1.22-bin.jar到/home/grid/sqoop-1.2.0-cdh3b4/lib/目录下
[grid@h1 ~]$ cd hadoop-0.20.2-cdh3b4
[grid@h1 hadoop-0.20.2-cdh3b4]$ cp hadoop-core-0.20.2-cdh3b4.jar  /home/grid/sqoop-1.2.0-cdh3b4/lib/
[grid@h1 grid]$ cp mysql-connector-java-5.1.22-bin.jar  /home/grid/sqoop-1.2.0-cdh3b4/lib/
五、配置sqoop-1.2.0-cdh3b4/bin/configure-sqoop文件
[grid@h1 conf]$ cd ../bin
[grid@h1 bin]$ pwd
/home/grid/sqoop-1.2.0-cdh3b4/bin
[grid@h1 bin]$ vim configure-sqoop
注释掉hbase和zookeeper检查(除非你准备使用habase等hadoop上的组件)
# check: if we can't find our dependencies, give up here.
if [ ! -d ${hadoop_home} ]; then
  echo error: $hadoop_home does not exist!
  echo 'please set $hadoop_home to the root of your hadoop installation.'
  exit 1
fi      只有红色需要修改
#if [ ! -d ${hbase_home} ]; then
  #echo error: $hbase_home does not exist!
  #echo 'please set $hbase_home to the root of your hbase installation.'
  #exit 1
#fi
#if [ ! -d ${zookeeper_home} ]; then
# echo error: $zookeeper_home does not exist!
# echo 'please set $zookeeper_home to the root of your zookeeper installation.'
# exit 1
#fi
六、配置所需环境变量
在哪里执行sqoop,就在哪台机器上设置一下
[grid@h1 grid]$ vim .bashrc    添加
export java_home=/usr
export jre_home=/usr/java/jdk1.6.0_25/jre
export path=/usr/java/jdk1.6.0_25/bin:/home/grid/hadoop-0.20.2/bin:/home/grid/pig-0.9.2/bin:$path
export classpath=./:/usr/java/jdk1.6.0_25/lib:/usr/java/jdk1.6.0_25/jre/lib
export pig_classpath=/home/grid/hadoop-0.20.2/conf
export hive_home=/home/grid/hive-0.8.1
export hive_conf_dir=$hive_home/conf
export hadoop_home=/home/grid/hadoop-0.20.2   
作用:让sqoop程序从环境变量里找到hadoop的位置,从而找到hadoop配置文件,知道集群的部署情况
[grid@h1 grid]$ echo $hadoop_home      检查一下没有问题
/home/grid/hadoop-0.20.2
七、配置启动hadoop集群
h1机器  master
[grid@h1 bin]$ pwd
/home/grid/hadoop-0.20.2/bin
[grid@h1 bin]$ ./start-all.sh
starting namenode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-h1.out
h2: starting datanode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h2.out
h4: starting datanode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h4.out
h1: starting secondarynamenode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-h1.out
starting jobtracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-h1.out
h2: starting tasktracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h2.out
h4: starting tasktracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h4.out
[grid@h1 bin]$ jps
17191 jobtracker
16955 namenode
17442 jps
17121 secondarynamenode
h2机器  slave
[grid@h2 ~]$ jps
32523 jps
17188 tasktracker
13727 hquorumpeer
17077 datanode
h4机器  slave
[grid@h4 ~]$ jps
27829 tasktracker
26875 jps
17119 datanode
31083 jps
11557 hquorumpeer
[grid@h1 bin]$ ./hadoop dfsadmin –report          检查hadoop集群状态
configured capacity: 19865944064 (18.5 gb)
present capacity: 8741523456 (8.14 gb)
dfs remaining: 8726482944 (8.13 gb)
dfs used: 15040512 (14.34 mb)
dfs used%: 0.17%
under replicated blocks: 4
blocks with corrupt replicas: 0
missing blocks: 0
-------------------------------------------------
datanodes available: 2 (2 total, 0 dead)             --2个节点存活无shutdown
name: 192.168.2.103:50010                                -- slaves  h2
decommission status : normal                             --状态正常
configured capacity: 9932972032 (9.25 gb)
dfs used: 7520256 (7.17 mb)
non dfs used: 5447561216 (5.07 gb)
dfs remaining: 4477890560(4.17 gb)
dfs used%: 0.08%
dfs remaining%: 45.08%
last contact: fri dec 14 18:10:11 cst 2012
name: 192.168.2.105:50010                              -- slaves  h4
decommission status : normal                           --状态正常
configured capacity: 9932972032 (9.25 gb)
dfs used: 7520256 (7.17 mb)
non dfs used: 5676859392 (5.29 gb)
dfs remaining: 4248592384(3.96 gb)
dfs used%: 0.08%
dfs remaining%: 42.77%
last contact: fri dec 14 18:10:11 cst 2012
集群正常启动了
八、启动mysql,创建leo用户进行sqoop连接
1. 必须启动服务才能操作数据库
数据库端口:3306
mysqll服务名:mysql55
mysql状态:已经启动
创建leo用户
grant all privileges on *.* to 'leo'@'%' identified by 'leo' with grant option;
select * from mysql.user;
flush privileges;
九、mysql 中建立sqoop库,网站空间,test表,添加数据
[grid@h1 bin]$ ping 192.168.2.110                          检查linux for windows 的连接性
ping 192.168.2.110 (192.168.2.110) 56(84) bytes of data.
64 bytes from 192.168.2.110: icmp_seq=1 ttl=64 time=14.5 ms
64 bytes from 192.168.2.110: icmp_seq=2 ttl=64 time=3.43 ms
64 bytes from 192.168.2.110: icmp_seq=3 ttl=64 time=9.68 ms
64 bytes from 192.168.2.110: icmp_seq=4 ttl=64 time=0.549 ms
^c
--- 192.168.2.110 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3630ms
rtt min/avg/max/mdev = 0.549/7.063/14.577/5.453 ms
[grid@h1 grid]$ mysql -h192.168.2.110 -uleo –pleo  使用leo用户登录数据库
命令列表
show  databases;                         显示当前有哪些数据库
create  database  sqoop;           创建sqoop数据库
其它类似信息

推荐信息