您好,欢迎访问一九零五行业门户网

初识全文索引

通常来说,全文索引大多用在olap环境当中,全文索引擅长于词汇的快速搜索。 一、全文索引和普通b_tree索引对比 sql create tablet1 (id int,name varchar(10)); table created. sql create indext1_ind on t1(name); index created. sql create tablet2 as s
通常来说,全文索引大多用在olap环境当中,全文索引擅长于词汇的快速搜索。
一、全文索引和普通b_tree索引对比
sql> create tablet1 (id int,name varchar(10));
table created.
sql> create indext1_ind on t1(name);
index created.
sql> create tablet2 as select * from t1;
table created.
sql> create indext2_ind on t2(name) indextype is ctxsys.context;
index created.
sql> select *from t1 where name like '%tom%';
id name
--------------------
1 tom
2 tom tom
2 tom tom
execution plan
----------------------------------------------------------
plan hash value:3589342044
--------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------------
| 0 | select statement | | 1 | 10 | 2 (0)| 00:00:01 |
| 1 | table access by index rowid| t1 | 1 | 10 | 2 (0)| 00:00:01 |
|* 2 | index full scan | t1_ind| 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
predicateinformation (identified by operation id):
---------------------------------------------------
2 - filter(name like '%tom%' andname is not null)
statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
676 bytes sent via sql*net to client
519 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
sql> select *from t2 where contains(name,'tom')>0;
id name
--------------------
1 tom
2 tom tom
2 tom tom
execution plan
----------------------------------------------------------
plan hash value:785228215
--------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------------
| 0 | select statement | | 3 | 30 | 7 (0)| 00:00:01 |
| 1 | table access by index rowid| t2 | 3 | 30 | 7 (0)| 00:00:01 |
|* 2 | domain index | t2_ind| | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
predicateinformation (identified by operation id):
---------------------------------------------------
2 -access(ctxsys.contains(name,'tom')>0)
statistics
----------------------------------------------------------
10 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
676 bytes sent via sql*net to client
519 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
sql> selectobject_name,object_type from user_objects order by 1;
object_name object_type
-------------------------------------------------
--dr开头的四张表为全文索引的基表
dr$t2_ind$x index
drc$t2_ind$r index
sys_il0000236119c00006$$ index
sys_il0000236124c00002$$ index
sys_iot_top_236122 index
sys_iot_top_236128 index
sys_lob0000236119c00006$$ lob
object_name object_type
-------------------------------------------------
sys_lob0000236124c00002$$ lob
t1 table
t1_ind index
t2 table
t2_ind index
二、dml操作对全文索引的影响
以context方式创建的全文索引并不是基于事务的,默认情况下,即使一个dml操作提交,信息也不会更新到全文索引中。
1、insert 操作
sql> create tablet(name varchar2(30));
table created.
sql> create indext_ind on t(name) indextype is ctxsys.context;
index created.
sql> insert intot values('i am an oracle dba');
1 row created.
sql> commit;
insert数据已提交,我们看看全文索引是否已更新
sql> setautotrace on
sql> select *from t where name like '%dba%';
name
------------------------------
i am an oracle dba 
execution plan
----------------------------------------------------------
plan hash value:1601196873
--------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------
| 0 | select statement | | 1 | 17 | 2 (0)| 00:00:01 |
|* 1 | table access full| t | 1 | 17 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
predicateinformation (identified by operation id):
---------------------------------------------------
1 - filter(name is not null andname like '%dba%')
note
-----
- dynamic sampling used for this statement(level=2)
statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
15 consistent gets
0 physical reads
0 redo size
538 bytes sent via sql*net to client
520 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
sql> set line 200
sql> select *from t where contains(name,'dba') >0;
no rows selected
execution plan
----------------------------------------------------------
plan hash value:315187259
-------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
-------------------------------------------------------------------------------------
| 0 | select statement | | 1 | 29 | 4 (0)| 00:00:01 |
| 1 | table access by index rowid| t | 1 | 29 | 4 (0)| 00:00:01 |
|* 2 | domain index | t_ind| | | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
predicateinformation (identified by operation id):
---------------------------------------------------
2 -access(ctxsys.contains(name,'dba')>0)
note
-----
- dynamic sampling used for this statement(level=2)
statistics
----------------------------------------------------------
1829 recursive calls
0 db block gets
2696 consistent gets
30 physical reads
0 redo size
332 bytes sent via sql*net to client
509 bytes received via sql*net from client
1 sql*net roundtrips to/from client
164 sorts (memory)
0 sorts (disk)
0 rows processed
以上发现,全文索引并没有自动更新,而是把记录存放在线面的ctxsys.dr$pending表中。
sql> setautotrace off
sql> select *from ctxsys.dr$pending;
pnd_cid pnd_pid pnd_rowid pnd_timestamp p
-------------------- ------------------ ------------------ -
1084 0 aabgmvaaeaaaadmaaa 03-apr-14 n
sql> insert intot values('he is an oracle dba');
1 row created.
sql> commit;
commit complete.
sql> select *from ctxsys.dr$pending;
pnd_cid pnd_pid pnd_rowid pnd_timestamp p
-------------------- ------------------ ------------------ -
1084 0 aabgmvaaeaaaadmaaa 03-apr-14 n
1084 0 aabgmvaaeaaaadmaab 03-apr-14 n
sql> select *from t where contains(name,'dba') >0;
no rows selected
为了把信息同步到全文索引中,我们需要手工同步:
sql> alter indext_ind rebuild parameters ('sync');
index altered.
sql> select *from t where contains(name,'dba') >0; 
name
------------------------------
i am an oracle dba
he is an oracle dba
sql> select *from ctxsys.dr$pending;
no rows selected
2、delete操作
sql> select *from t;
name
------------------------------
i am an oracle dba
he is an oracle dba
sql> delete fromt where name='he is an oracle dba';
1 row deleted.
sql> select *from t where contains(name,'dba') >0;
name
------------------------------
i am an oracle dba
sql> select *from ctxsys.dr$pending;
no rows selected
sql> select *from ctxsys.dr$delete;
del_idx_iddel_ixp_id del_docid
-------------------- ----------
1084 0 2
这里我们看到全文索引立即生效了,至于ctxsys.dr$delete里面的数据是delete操作的中间状态,用来维护一个事物,无论事物提交或者回滚。
sql> rollback;
rollback complete.
sql> select *from t where contains(name,'dba') >0;
name
------------------------------
i am an oracle dba
he is an oracle dba
sql> select *from ctxsys.dr$delete;
no rows selected
3、update操作
update操作相当于delete+insert操作,所以默认情况下需要手动刷新全文索引。
sql> update t setname='oracle dba' where name='i am an oracle dba';
1 row updated.
sql> select *from ctxsys.dr$delete;
del_idx_iddel_ixp_id del_docid
-------------------- ----------
1084 0 1
sql> select *from ctxsys.dr$pending;
pnd_cid pnd_pid pnd_rowid pnd_timestamp p
-------------------- ------------------ ------------------ -
1084 0 aabgmvaaeaaaadmaaa 03-apr-14 n
sql> select *from t where contains(name,'dba') > 0;
name
------------------------------
he is an oracle dba
sql> alter indext_ind rebuild parameters ('sync');
index altered.
sql> select *from t where contains(name,'dba') > 0;
name
------------------------------
he is an oracle dba
oracle dba
由于全文索引创建对象大多是海量数据的表,dml操作如果实时更新会影响到系统性能。创建全文索引同步的三个选项:
manual:默认选项
every:在一个时间段后更新索引
on commitdml:在事务提交后更新索引
语法如下:
create index t_indon t(name) indextype is ctxsys.context parameters('sync (on commit)');
查看全文索引信息和性能的工具包ctx_report
其它类似信息

推荐信息