您好,欢迎访问一九零五行业门户网

筹建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+php调用示例

搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+php调用示例
一个文档包含了安装、增量备份、扩展、api调用示例,省去了查找大量文章的时间。
搭建coreseek(sphinx+mmseg3)安装
[第一步] 先安装mmseg3cd /var/installwget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gztar zxvf coreseek-4.1-beta.tar.gzcd coreseek-4.1-betacd mmseg-3.2.14./bootstrap./configure --prefix=/usr/local/mmseg3make && make install遇到的问题:error: cannot find input file: src/makefile.in或者遇到其他类似error错误时...解决方案:依次执行下面的命令,我运行'aclocal'时又出现了错误,解决方案请看下文描述yum -y install libtoolaclocallibtoolize --forceautomake --add-missingautoconfautoheadermake clean
安装好'libtool'继续从'aclocal'开始执行上面提到的一串命令,执行完后再运行最开始的安装流程即可。
[第二步] 安装coreseek##安装coreseek$ cd csft-3.2.14 或者 cd csft-4.0.1 或者 cd csft-4.1$ sh buildconf.sh #输出的warning信息可以忽略,如果出现error则需要解决$ ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql##如果提示mysql问题,可以查看mysql数据源安装说明 http://www.coreseek.cn/product_install/install_on_bsd_linux/#mysql$ make && make install$ cd ..##命令行测试mmseg分词,coreseek搜索(需要预先设置好字符集为zh_cn.utf-8,确保正确显示中文)$ cd testpack$ cat var/test/test.xml #此时应该正确显示中文$ /usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/test.xml$ /usr/local/coreseek/bin/indexer -c etc/csft.conf --all$ /usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索
出现这个 xmlpipe2 support not compiled in. to use xmlpipe2, install missing xml libra  错误
执行以下命令:
yum -y install expat expat-devel
依次安装后,从新编译coreseek,然后再生成索引,就可以通过了。
结果如下:
coreseek fulltext 4.1 [ sphinx 2.0.2-dev (r2922)] copyright (c) 2007-2011, beijing choice software technologies inc (http://www.coreseek.com) using config file 'etc/csft.conf'... index 'xml': query '网络搜索 ': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document=1, weight=1590, published=thu apr 1 07:20:07 2010, author_id=1 words: 1. '网络': 1 documents, 1 hits 2. '搜索': 2 documents, 5 hits
下面开始sphinx与mysql的配置
创建sphinx统计表,在coreseek_test库中执行。
create table sph_counter( counter_id integer primary key not null, max_doc_id integer not null);
创建配置sphinx与mysql的配置文件
# vi /usr/local/coreseek/etc/csft_mysql.conf
#mysql数据源配置,详情请查看:http://www.coreseek.cn/products-install/mysql/#请先将var/test/documents.sql导入数据库,并配置好以下的mysql用户密码数据库#源定义source main #定义源名称{ type = mysql sql_host = localhost sql_user = root sql_pass = 123456 sql_db = coreseek_test sql_port = 3306 sql_query_pre = set names utf8 sql_query_pre = replace into sph_counter select 1,max(id) from hr_spider_company; # 更新sph_counter sql_query = select * from hr_spider_company where id( select max_doc_id from sph_counter where counter_id=1 ) # 根据sph_counter纪录id读入数据 sql_query_post_index = replace into sph_counter select 1,max(id) from hr_spider_company # 更新sph_counter}#index定义index main #注意与定义名称的统一性{ source = main #对应的source名称 path = /usr/local/coreseek/var/data/mysql #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 #中文分词配置,详情请查看:http://www.coreseek.cn/products-install/coreseek_mmseg/ charset_dictpath = /usr/local/mmseg3/etc/ #bsd、linux环境下设置,/符号结尾 charset_type = zh_cn.utf-8}index delta : main #注意与定义名称的统一性{ source = delta path = /usr/local/coreseek/var/data/delta}#全局index定义indexer{ mem_limit = 128m}#searchd服务定义searchd{ listen = 9312 read_timeout = 5 max_children = 30 max_matches = 1000 seamless_rotate = 0 preopen_indexes = 0 unlink_old = 1 pid_file = /usr/local/coreseek/var/log/searchd_mysql.pid #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... log = /usr/local/coreseek/var/log/searchd_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... query_log = /usr/local/coreseek/var/log/query_mysql.log #请修改为实际使用的绝对路径,例如:/usr/local/coreseek/var/... binlog_path = #关闭binlog日志}
我的测试表名为hr_spider_company,你只需要根据实际需求更改为自己的表名即可。
调用命令列表:
启动后台服务(必须开启)
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf
执行索引(查询、测试前必须执行一次)
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
执行增量索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate
合并索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 0
(为了防止多个关键字指向同一个文档加上--merge-dst-range deleted 0 0)
后台服务测试
# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf aaa
关闭后台服务
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop
自动化命令:
crontab -e
*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate*/5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 030 1 * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
以下任务计划的意思是:每隔一分钟执行一遍增量索引,每五分钟执行一遍合并索引,每天1:30执行整体索引。
sphinx扩展安装安装
coreseek官方教程中建议php使用直接include一个php文件进行操作,事实上php有独立的sphinx模块可以直接操作coreseek(coreseek就是sphinx!)已经进入了php的官方函数库,而且效率的提升不是一点点!但php模块依赖于libsphinxclient包。
[第一步] 安装依赖libsphinxclient# cd /var/install/coreseek-4.1-beta/csft-4.1/api/libsphinxclient/# ./configure --prefix=/usr/local/sphinxclientconfigure: creating ./config.statusconfig.status: creating makefileconfig.status: error: cannot find input file: makefile.in #报错configure失败 //处理configure报错编译过程中报了一个config.status: error: cannot find input file: src/makefile.in这个的错误,然后运行下列指令再次编译就能通过了:# aclocal# libtoolize --force# automake --add-missing# autoconf# autoheader# make clean//从新configure编译# ./configure# make && make install
[第二步] 安装sphinx的php扩展http://pecl.php.net/package/sphinx# wget http://pecl.php.net/get/sphinx-1.3.0.tgz# tar zxvf sphinx-1.3.0.tgz# cd sphinx-1.3.0# phpize# ./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient# make && make install# cd /etc/php.d/# cp gd.ini sphinx.ini# vi sphinx.iniextension=sphinx.so# service php-fpm restart
打开phpinfo看一下是否已经支持了sphinx模块。
php调用sphinx示例:
setserver(127.0.0.1, 9312); $s->setmatchmode(sph_match_phrase); $s->setmaxquerytime(30); $res = $s->query(宝马,'main'); #[宝马]关键字,[main]数据源source $err = $s->getlasterror(); var_dump(array_keys($res['matches'])); echo
.通过获取的id来读取数据库中的值即可。.
; echo ''; var_dump($res); var_dump($err); echo '
';
输出结果:
array(20) { [0]=> int(1513) [1]=> int(42020) [2]=> int(57512) [3]=> int(59852) [4]=> int(59855) [5]=> int(60805) [6]=> int(94444) [7]=> int(94448) [8]=> int(99229) [9]=> int(107524) [10]=> int(111918) [11]=> int(148) [12]=> int(178) [13]=> int(595) [14]=> int(775) [15]=> int(860) [16]=> int(938) [17]=> int(1048) [18]=> int(1395) [19]=> int(1657)}
通过获取的id来读取数据库中的值即可。
array(10) { [error]=> string(0) [warning]=> string(0) [status]=> int(0) [fields]=> array(17) { [0]=> string(3) cid [1]=> string(8) link_url [2]=> string(12) company_name [3]=> string(9) type_name [4]=> string(10) trade_name [5]=> string(5) scale [6]=> string(8) homepage [7]=> string(7) address [8]=> string(9) city_name [9]=> string(8) postcode [10]=> string(7) contact [11]=> string(9) telephone [12]=> string(6) mobile [13]=> string(3) fax [14]=> string(5) email [15]=> string(11) description [16]=> string(11) update_time } [attrs]=> array(3) { [from_id]=> string(1) 1 [link_id]=> string(1) 1 [add_time]=> string(1) 1 } [matches]=> array(20) { [1513]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 3171471 [add_time]=> string(10) 1394853454 } } [42020]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2248093 [add_time]=> string(10) 1394913884 } } [57512]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2684470 [add_time]=> string(10) 1394970833 } } [59852]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 3 [link_id]=> string(1) 0 [add_time]=> string(10) 1394977527 } } [59855]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 3 [link_id]=> string(1) 0 [add_time]=> string(10) 1394977535 } } [60805]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 3 [link_id]=> string(1) 0 [add_time]=> string(10) 1394980072 } } [94444]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 3 [link_id]=> string(1) 0 [add_time]=> string(10) 1395084115 } } [94448]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 3 [link_id]=> string(1) 0 [add_time]=> string(10) 1395084124 } } [99229]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 1297992 [add_time]=> string(10) 1395100520 } } [107524]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 5 [link_id]=> string(10) 4294967295 [add_time]=> string(10) 1395122053 } } [111918]=> array(2) { [weight]=> int(2) [attrs]=> array(3) { [from_id]=> string(1) 5 [link_id]=> string(10) 4294967295 [add_time]=> string(10) 1395127953 } } [148]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2770294 [add_time]=> string(10) 1394852562 } } [178]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2474558 [add_time]=> string(10) 1394852579 } } [595]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(6) 534804 [add_time]=> string(10) 1394852862 } } [775]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 3230353 [add_time]=> string(10) 1394852980 } } [860]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2549233 [add_time]=> string(10) 1394853048 } } [938]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 3191382 [add_time]=> string(10) 1394853114 } } [1048]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 3234645 [add_time]=> string(10) 1394853174 } } [1395]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2661219 [add_time]=> string(10) 1394853375 } } [1657]=> array(2) { [weight]=> int(1) [attrs]=> array(3) { [from_id]=> string(1) 2 [link_id]=> string(7) 2670624 [add_time]=> string(10) 1394853540 } } } [total]=> int(543) [total_found]=> int(543) [time]=> float(0.109) [words]=> array(1) { [宝马]=> array(2) { [docs]=> int(543) [hits]=> int(741) } }}string(0)
其它类似信息

推荐信息