外部表定义可读外部表(不可以做dml操作)可写外部表(只insert,不可以select,update,delete) 装载 创建外部表=# create external web table ext_expenses (name text, date date, amount float4, category text, description text) location ( 'http://intr
外部表定义可读外部表(不可以做dml操作)可写外部表(只insert,不可以select,update,delete)
装载创建外部表=# create external web table ext_expenses (name text,
date date, amount float4, category text, description text)
location ( 'http://intranet.company.com/expenses/sales/file.csv',
'http://intranet.company.com/expenses/exec/file.csv',
'http://intranet.company.com/expenses/finance/file.csv',
'http://intranet.company.com/expenses/ops/file.csv',
'http://intranet.company.com/expenses/marketing/file.csv',
'http://intranet.company.com/expenses/eng/file.csv' )
format 'csv' ( header );
装载外部表数据=# insert into expenses_travel
select * from ext_expenses where category='travel';
或者想要快速装载全部数据到一个新的数据库表中:
=# create table expenses as select * from ext_expenses;
测试:[root@mdw ~]# wget http://mirrors.aliyun.com/repo/centos-6.repo
--2014-03-04 13:51:30-- http://mirrors.aliyun.com/repo/centos-6.repo
正在解析主机 mirrors.aliyun.com... 115.28.122.210, 112.124.140.210
正在连接 mirrors.aliyun.com|115.28.122.210|:80... 已连接。
已发出 http 请求,正在等待回应... 200 ok
长度:2086 (2.0k) [application/octet-stream]
正在保存至: “centos-6.repo”
100%[==============================================================================================================================>] 2,086 --.-k/s in 0s
2014-03-04 13:51:30 (194 mb/s) - 已保存 “centos-6.repo” [2086/2086])
libo=# create external web table ext_expenses (name text)
libo-# location ('http://mirrors.aliyun.com/repo/centos-6.repo')
libo-# format 'text' ( delimiter '|' null ' ') ;
create external tablelibo=# create table expenses as select * from ext_expenses;
notice: table doesn't have 'distributed by' clause -- using column(s) named 'colum' as the greenplum database data distribution key for this table.
hint: the 'distributed by' clause determines the distribution of data. make sure column(s) chosen are the optimal data distribution key to minimize skew.
error: could not translate host name mirrors.aliyun.com, port 80 to address: temporary failure in name resolution (cdbutil.c:754) (seg0 slice1 sdw1:40000 pid=26261) (cdbdisp.c:1489)
libo=#
libo=#
libo=# select * from ext_expenses;
error: could not translate host name mirrors.aliyun.com, port 80 to address: temporary failure in name resolution (cdbutil.c:754) (seg0 slice1 sdw1:40000 pid=26254) (cdbdisp.c:1489)
libo=# drop external web table ext_expenses ;
drop external table
libo=# create external web table ext_expenses (colum text)
libo-# location ('http://115.28.122.210/repo/centos-6.repo')
libo-# format 'text' ( delimiter '|' null ' ') ;
create external table
libo=# select * from ext_expenses;
error: connection with gpfdist failed for http://115.28.122.210/repo/centos-6.repo. effective url: http://115.28.122.210/repo/centos-6.repo. (seg0 slice1 sdw1:40000 pid=26296)
libo=#
[gpadmin@mdw data_tst]$ gpfdist -d /home/gpadmin/data_tst -p 8081 -l /home/gpadmin/log1 &
[1] 10321
[gpadmin@mdw data_tst]$ serving http on port 8081, directory /home/gpadmin/data_tst
[root@mdw ~]# wget http://192.168.100.101:8081/aaa
--2014-03-04 14:14:01-- http://192.168.100.101:8081/aaa
正在连接 192.168.100.101:8081... 已连接。
已发出 http 请求,正在等待回应... 200 ok
长度:未指定 [text/plain]
正在保存至: “aaa”
[ ] 17 --.-k/s in 0s
2014-03-04 14:14:01 (1.61 mb/s) - “aaa” 已保存 [17]
libo=# create external web table ext_expenses (colum text)
libo-# location ('http://192.168.100.101:8081/aaa')
libo-# format 'text' ( delimiter '|' null ' ') ;
create external table
libo=# select * from ext_expenses;
colum
-------
aaaa
aaa
aa
a
(7 rows)
create table t as select * from t_ext distributed by(id);
libo=# create table t as select * from t_ext distributed randomly;
select 10
libo=# create external table t_ext (id int,name text)
libo-# location ('gpfdist://192.168.100.11:8081/aaa.csv')
libo-# format 'csv';
create external table
libo=# select * from t_ext;
error: missing data for column name (seg3 slice1 sdw2:40001 pid=10243)
detail: external table t_ext, line 4000 of gpfdist://192.168.100.11:8081/aaa.csv:
原因:csv 中有空行
结论:外部表只支持gpfdist 的http协议 gpfdist服务是gp的简单的web服务
装载错误处理:在定义可读外部表时使用create external table命令
结合使用segment reject limit子句。
拒绝限制count参数可用于指定记录数(缺省),或者使用percent指定记录
百分比。
保存错误记录以备将来的检查,使用log errors into子句指定错误记
录日志表。
使用gpload装载
卸载数据
禁止web表定义中使用executelibo=# show gp_external_enable_exec
libo-# ;
gp_external_enable_exec
-------------------------
on
(1 row)
数据格式
在使用各种gp命令装载或卸载数据时,需要指定数据如何格式化
行分隔gpdb预期是以lf字符(line feed/换行符/0x0a)、cr(carriage return/回车/0x0d)
或者cr加lf(cr+lf/回车换行/0x0a 0x0d)作为一行的分割。lf是标准unix或
类unix操作系统的标准换行标识符。其他操作系统(如windows、mac os 9)可
能是cr或者cr+lf。所有这些换行标识符在gpdb中都被支持作为行分隔符
列分隔对于text文件来说缺省的列分隔符是tab字符(0x09),而 对 于csv文件来说缺
省的列分隔符是逗号(0x2c)。不过在使用copy、create external table
时或者使用gpload定义数据格式时都可以使用delimiter子句执行其他的单
字符分隔符。