Oracle Text(全文检索)

oracle text(全文检索)
查看相关的信息 * from nls_database_parameters
1、简单应用
    1.1如果要使用全文检索，当前oracle用户必须具有ctxapp角色
--创建一个用户
--create user textsearch identified by textsearch;
/**
赋予用户三个角色，其中有一个为ctxapp角色，
以便该用户可以使用与全文检索相关的procedure
*/
grant connect,resource,ctxapp to textsearch;
/
使用创建的用户登录
sql> conn textsearch
输入口令: **********
已连接。
    1.2创建要进行全文检索的数据表,准备数据
    --drop table textdemo;
    create table textdemo(
        id number not null primary key,
        book_author varchar2(20),--作者
        publish_time date,--发布日期
        title varchar2(400),--标题
        book_abstract varchar2(2000),--摘要
        path varchar2(200)--路径
    );
    commit;
    insert into textdemo values(1,'宫琦峻',to_date('2008-10-07','yyyy-mm-dd'),'移动城堡','故事发生在19世纪末的欧洲，善良可爱的苏菲被恶毒的女巫施下魔咒，从18岁的女孩变成90岁的婆婆，孤单无助的她无意中走入镇外的移动城堡，据说它的主人哈尔以吸取女孩的灵魂为乐，但是事情并没有人们传说的那么可怕，性情古怪的哈尔居然收留了苏菲，两个人在四脚的移动城堡中开始了奇妙的共同生活，一段交织了爱与痛、乐与悲的爱情故事在战火中悄悄展开','e:textsearchmoveingcastle.doc');
    insert into textdemo values(2,'莫·贝克曼贝托夫',to_date('2008-10-07','yyyy-mm-dd'),'子弹转弯','这部由俄罗斯导演提莫·贝克曼贝托夫执导的影片自6月末在北美上映以来，已经在全球取得了超过3亿美元的票房收入。在亚洲上映后也先后拿下日本、韩国等地的票房冠军宝座。虽然不少网友在此之前也相继通过各种渠道接触到本片，但相信影片凭着在大银幕上呈现出的超酷的视听效果，依然能够吸引大量影迷前往影院捧场。','e:textsearch.pdf');
    insert into textdemo values(3,'袁泉',to_date('2008-10-07','yyyy-mm-dd'),'主演吴彦祖和袁泉现身','电影《如梦》在上海同乐坊拍摄，主演吴彦祖和袁泉现身。由于是深夜拍摄，所以周围并没有过多的fans注意到，给了剧组一个很清净的拍摄环境，站在街头的袁泉低着头，在寒冷的夜里看上去还真有些像女鬼，令人毛骨悚然。','e:textsearchdream.txt');
    commit;
1.3在摘要字段上创建索引
/*
*创建索引，使用默认的参数
*/
   --drop index demo_abstract;
    create index demo_abstract on textdemo(book_abstract)
    indextype is ctxsys.context
    --parameters('datastore ctxsys.default_datastore filter ctxsys.auto_filter ')
    ;
    commit;
(1) 建表并装载文本。
(2) 建立索引。如果想配置oracle索引，可以在建立索引前进行配置，如：改变词法分析器。可以下面sql语句查看oracle全文检索的配置：
select * from ctx_preferences;
(3) sql查询。
(4) 索引维护：同步与优化。
授权
执行全文的用户必须具有 ctxapp角色或 ctxsys用户，以及 ctx_ddl包执行权限。
(1) 用 sys用户授予 scott 用户 ctxapp 角色，命令如下：
grant ctxapp to scott;
(2) 用 ctxsys 用户给 scott 用户授权 ctx_ddl 包的执行权限，命令如下：
grant execute on ctx_dll to scott;
创建表、添加记录和索引
以下的sql语句和 job都在 scott 用户下执行。首先，执行以下 sql 语句，创建表 docs，并插入两条记录，提交后创建索引 doc_index。
drop table docs;create table docs (id number primary key,text varchar2(80)); insert into docs values (1,'the first doc');insert into docs values (2,'the second doc');commit; create index doc_index on docs(text) indextype is ctxsys.context;
然后，执行查询，c#代码如下：
string connstr=data source=ora9; uid=scott; pwd=tiger; unicode=true; string sqlstr = select id from docs where contains(text,'%first%')>0;oracledataadapter da = new oracledataadapter(sqlstr, connstr);datatable dt = new datatable();da.fill(dt);response.write(dt.rows[0][0].tostring());
同步和优化
当表 docs 发生变化（插入，删除）后，索引必须能反应这个变化，这就需要对索引进行同步和优化。oracle提供 ctx server 完成同步和优化，也可以用以下的job来完成。
同步sync
将新的term保存到i表。
create or replace procedure sync isbeginexecute immediate 'alter index doc_index rebuild online' ||' parameters ( ''sync'' )';execute immediate 'alter index doc_index rebuild online' ||' parameters ( ''optimize full maxtime unlimited'' )';end sync;
优化
清除i表的垃圾，将已经被删除的term从i表删除。
declarev_job number;begindbms_job.submit(job => v_job,what => 'sync;',next_date => sysdate, /* default */interval => 'sysdate + 1/720' /* = 1 day / ( 24 hrs * 30 min) = 2 mins */);dbms_job.run ( v_job );end;
其中，i表是 dr$doc_index$i 表。用户建立索引后，oracle会自动创建四个表，dr$doc_index$i、dr$doc_index$k、dr$doc_index$n和dr$doc_index$r。可以用select语句查看此表的内容。
说明
(1) 本文是在oracle 9i和10g环境下完全实现oracle的全文检索，包括建立表和索引，进行同步和优化；
(2) 进行全文检索的sql语句是select id from docs where contains(text,'%first%')>0；
(3) 其中，>0是有效的oracle sql所必需的，因为，oracle sql不支持函数的布尔返回值；
(4) 其中，contains(text,'%first%')>0，在oracle 9i和10g与11g下有所不同；
(5) 最近做项目从oracle 10g改成11g，在进行全文检索时，oracle 10g下的代码，在11g下检索不到结果；
(6) 初步认为，oracle 9i和10g与11g的区别是，在9i和10g下，如果不使用“%”，则是精确检索，否则是模糊检索。而在11g下，则完全不用“%”；
(7) 另外，在9i和10g下，可以使用如：contains(text,'%first% and %second%')>0，进行全文检索，但在11g下，是不可以的，要分开写，如：
contains(text,'%first%')>0 and contains(text,'%second%')>0;
(8) 感觉11g下的全文检索更好

Oracle Text(全文检索)

推荐信息