SphinxSearch是一套非常棒的搜尋引擎,以下簡稱為sphinx,可支援 mysql 及中文語系,目前最新版已到3.7.1(2024.4.14)
在前陣子曾寫過一篇 [Rocky9] 安裝SPHINX Search 3 支援中文 @新精讚 ,這個是v3.6.1且直接裝在os中,有需要可以參考這篇。
原本預計更換伺服器時,想說裝原生的服務(service)並且都已經建置完成。但後來又覺得如果只有搜尋引擎放在docker外,其它放在docker內的話,感覺不是那麼「模組」,同時以後更新的話就很麻煩,所以只好又花了八個小時的研究才完成這篇文章。
使用docker安裝上簡單,但是設定、索引、php的程序比起服務有較多的差別,花了很多時間才攻克。
OS
Rocky Linux release 9.4 (Blue Onyx)
一、Docker 設置
Docker Image
docker image 使用 macbre/sphinxsearch [1]
Docker compose
sphinx:
image: macbre/sphinxsearch:3.7.1
container_name: sphinx
restart: always
volumes:
- ./sphinx/data:/opt/sphinx/index # directory where sphinx will store index data
- ./sphinx/sphinx.conf:/opt/sphinx/conf/sphinx.conf # SphinxSE configuration file
# mem_limit: 512m # match indexer.value from sphinx.conf
# command:
# indexer --all --config /opt/sphinx/conf/sphinx.conf
ports:
- 9306:9306
networks:
- mynet
networks:
mynet:
driver: bridge
目錄結構
└─ sphinx
├── data <== 目前是空的,你可以交給docker自行建立
└── sphinx.conf
其中 sphinx.conf 需先建立完畢,寫在下方。
command 這行很關鍵,得在第一次啟動時使用,然後註解,否則會出現類似以下的錯誤 [2]:
WARNING: index 'test1': prealloc: failed to open /opt/sphinx/index/test1.sph: No such file or directory; NOT SERVING error.
這並不是權限上的錯誤,而是沒有進行初始化先建立索引。事實上,我發現只有採用 plain indexes的話才需要設個步驟,如果你要採用 rt indexs的話,可以直接啟動。
ports 指定9306為了方便查錯,建立完畢後可以直接mysql來查詢。
二、設定索引檔
修改 sphinx.conf 並加入中文的支援,以下範例是將資料表的每個欄位都做索引,僅供參考
source base
{
type = mysql
sql_host = 192.168.1.51 <== 設定你的ip
sql_user = note
sql_pass = yourpassword
sql_db = note
sql_port = 3306
sql_query_pre = SET NAMES utf8
}
source content:base
{
sql_query = \
SELECT nID, nID as theid, sn, title, keyword, remark, c.text, 'content' as module, c.is_closed, c.o_uid, c.catID, a.text as catname, u.site_name, u.site_prefix \
FROM content c left join catalog a using( catID) left join user_basic u on c.o_uid= u.uid
}
common
{
datadir = /opt/sphinx/index/
}
index content
{
field_string = theid
field_string = title
field_string = keyword
field_string = remark
field_string = text
attr_uint = o_uid
attr_uint = catID
attr_uint = is_closed
attr_string = sn
attr_string = module
attr_string = site_name
attr_string = catname
attr_string = site_prefix
type = plain
source = content
# path = /opt/sphinx/index/content <== 這行請移除或註解
html_strip = 1
min_word_len = 2
charset_table = 0..9, A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00C0..U+00D6->U+00E0..U+00F6, U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101, U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109, U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F, U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117, U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D, U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135, U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C, U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144, U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B, U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153, U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159, U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161, U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167, U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F, U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175, U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C, U+017C, U+017D->U+017E, U+017E, U+0410..U+042F->U+0430..U+044F, U+0430..U+044F, U+05D0..U+05EA, U+0531..U+0556->U+0561..U+0586, U+0561..U+0587, U+0621..U+063A, U+01B9, U+01BF, U+0640..U+064A, U+0660..U+0669, U+066E, U+066F, U+0671..U+06D3, U+06F0..U+06FF, U+0904..U+0939, U+0958..U+095F, U+0960..U+0963, U+0966..U+096F, U+097B..U+097F, U+0985..U+09B9, U+09CE, U+09DC..U+09E3, U+09E6..U+09EF, U+0A05..U+0A39, U+0A59..U+0A5E, U+0A66..U+0A6F, U+0A85..U+0AB9, U+0AE0..U+0AE3, U+0AE6..U+0AEF, U+0B05..U+0B39, U+0B5C..U+0B61, U+0B66..U+0B6F, U+0B71, U+0B85..U+0BB9, U+0BE6..U+0BF2, U+0C05..U+0C39, U+0C66..U+0C6F, U+0C85..U+0CB9, U+0CDE..U+0CE3, U+0CE6..U+0CEF, U+0D05..U+0D39, U+0D60, U+0D61, U+0D66..U+0D6F, U+0D85..U+0DC6, U+1900..U+1938, U+1946..U+194F, U+A800..U+A805, U+A807..U+A822, U+0386->U+03B1, U+03AC->U+03B1, U+0388->U+03B5, U+03AD->U+03B5, U+0389->U+03B7, U+03AE->U+03B7, U+038A->U+03B9, U+0390->U+03B9, U+03AA->U+03B9, U+03AF->U+03B9, U+03CA->U+03B9, U+038C->U+03BF, U+03CC->U+03BF, U+038E->U+03C5, U+03AB->U+03C5, U+03B0->U+03C5, U+03CB->U+03C5, U+03CD->U+03C5, U+038F->U+03C9, U+03CE->U+03C9, U+03C2->U+03C3, U+0391..U+03A1->U+03B1..U+03C1, U+03A3..U+03A9->U+03C3..U+03C9, U+03B1..U+03C1, U+03C3..U+03C9, U+0E01..U+0E2E, U+0E30..U+0E3A, U+0E40..U+0E45, U+0E47, U+0E50..U+0E59
ngram_len = 1
ngram_chars = U+4E00..U+9FBF, U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF, U+2F800..U+2FA1F, U+2E80..U+2EFF, U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF, U+3040..U+309F, U+30A0..U+30FF, U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF, U+3130..U+318F, U+A000..U+A48F, U+A490..U+A4CF
}
searchd
{
listen = 9312
listen = 9306:mysql41
log = /opt/sphinx/index/searchd.log
query_log = /opt/sphinx/index/query.log
read_timeout = 5
max_children = 30
pid_file = /opt/sphinx/index/searchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads # for RT to work
binlog_path = /opt/sphinx/index
}
上面路徑要設定正確,資料庫請先確定可以連結。
common 這個區塊指定索引建立的目錄即可
三、初次啟動
初次啟動時 sphinx.conf 必需設置正確,否則docker無法啟動。
請將上面 docker-compose.yml 的中 command 欄位註解取消,像這樣:
indexer --all --config /opt/sphinx/conf/sphinx.conf
ports:
- 9306:9306
啟動 docker container
# docker compose up -d sphinx
如果成功會像這樣:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
053e2c358921 macbre/sphinxsearch:3.7.1 "/bin/sh -c 'sh /opt…" 5 seconds ago Up 3 seconds 0.0.0.0:9306->9306/tcp, :::9306->9306/tcp, 36307/tcp
然後請勿留戀請立刻移除container
# docker compose down sphinx
成功啟動後再把 command 再註解起來,才不會一直建索引,否則服務會一直重啟導致失敗。
目錄 data 中會出現很多檔案,這樣就是正確的:
├── data
│ ├── binlogs
│ │ ├── content.binlog.0000
│ │ ├── content.binlog.lock
│ │ ├── content.binlog.meta
│ │ ├── media.binlog.0000
│ │ ├── media.binlog.lock
│ │ └── media.binlog.meta
│ ├── extra
│ ├── indexes
│ │ └─ content
│ │ ├── content.spa
│ │ ├── content.spb
│ │ ├── content.spc
│ │ ├── content.spd
│ │ ├── content.spe
│ │ ├── content.sph
│ │ ├── content.spi
│ │ ├── content.spj
│ │ ├── content.spk
│ │ ├── content.spl
│ │ └── content.spp
│ ├── logs
│ │ ├── query.log
│ │ └── searchd.log
│ ├── plugins
│ └── searchd.pid
└── sphinx.conf
以上綠色的部分是自動產生的,一切ok。
查看log發現
Sphinx 3.7.1 (commit da9f8a4e7)
Copyright (c) 2001-2024, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
[Sun Jun 9 06:56:09.100 2024] [1:1] using config file '/opt/sphinx/conf/sphinx.conf'...
[Sun Jun 9 06:56:09.101 2024] [1:1] WARNING: datadir: invalid 'log' value; must be empty or 'no'; log stays enabled
[Sun Jun 9 06:56:09.101 2024] [1:1] WARNING: datadir: invalid 'query_log' value; must be empty or 'no'; log stays enabled
[Sun Jun 9 06:56:09.101 2024] [1:1] WARNING: datadir: 'binlog_path' value ignored in datadir mode
[Sun Jun 9 06:56:09.101 2024] [1:1] WARNING: datadir: 'pid_file' value ignored in datadir mode
listening on all interfaces, port=9312
listening on all interfaces, port=9306
loading 2 indexes...
accepting connections
再修改 sphinx.conf 把上面的請個值註解(或刪除):
{
listen = 9312
listen = 9306:mysql41
# query_log = /opt/sphinx/index/query.log
read_timeout = 5
max_children = 30
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads # for RT to work
}
再重新啟動就ok了
查看版本
Sphinx 3.7.1 (commit da9f8a4e7)
Copyright (c) 2001-2024, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Built on: #1-Alpine SMP Tue, 19 Jul 2022 15:30:18 +0000
Built with: GNU 10.3.1
Build date: Mar 28 2024
Build type: release
Configure flags: cmake
Compiled DB drivers: mysql-dynamic pgsql-dynamic odbc-dynamic
Compiled features: libexpat libstemmer re2 jemalloc
Versions: binlog_format v.11, index_format v.63, udf_api v.23
四、建立索引
手動建立索引
# docker exec -t sphinx indexer --rotate --all --config /opt/sphinx/conf/sphinx.conf
Sphinx 3.7.1 (commit da9f8a4e7)
Copyright (c) 2001-2024, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/opt/sphinx/conf/sphinx.conf'...
indexing index 'content'...
collected 3350 docs, 13.6 MB
sorted 3.7 Mhits, 100.0% done
total 3350 docs, 13.59 Mb
total 2.3 sec, 5.947 Mb/sec, 1465 docs/sec
rotating indices: successfully sent SIGHUP to searchd (pid=1).
自動執行建立排程
10 2,11,16,21 * * * /usr/bin/docker exec -t sphinx indexer --rotate --all --config /opt/sphinx/conf/sphinx.conf > /tmp/sphinxsearch-indexer.log 2>&1
五、手動測試查詢
使用Mysql去查詢sphinx的搜尋結果,例如在test1索引中找具有'test'的字串
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 3.7.1 (commit da9f8a4e7)
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
mysql> show tables;
+---------+-------+
| Index | Type |
+---------+-------+
| content | local |
+---------+-------+
1 rows in set (0.00 sec)
+-------------+--------------+------------+------+
| Field | Type | Properties | Key |
+-------------+--------------+------------+------+
| id | bigint | | |
| theid | field_string | indexed | |
| title | field_string | indexed | |
| keyword | field_string | indexed | |
| remark | field_string | indexed | |
| text | field_string | indexed | |
| o_uid | uint | | |
| catid | uint | | |
| is_closed | uint | | |
| sn | string | | |
| module | string | | |
| site_name | string | | |
| catname | string | | |
| site_prefix | string | | |
+-------------+--------------+------------+------+
14 rows in set (0.00 sec)
用關鍵字查詢
+--------------+-------+-----------------------------------------------+
| sn | id | title |
+--------------+-------+-----------------------------------------------+
| 240119030359 | 16331 | [Rocky9] 安裝SPHINX Search 3 支援中文 |
| 161023012236 | 10058 | 安裝SPHINX支援中文 |
| 161023015030 | 10059 | PHP for sphinx 函式庫安裝 |
| 170512134862 | 11054 | [Centos7] 安裝sphinx+php7 |
| 240609132314 | 16422 | [Rocky9] Sphinx 3支援中文 + Docker + PHP7 |
| 170718172131 | 11465 | [Mysql/Mariadb] 全文檢索 fulltext index |
| 240118021319 | 16325 | 下一代精讚開發規畫 |
+--------------+-------+-----------------------------------------------+
7 rows in set (0.00 sec)
php 中可以使用 [3] 提供的 api。
簡單結論一下,在docker上的設置似乎更適合未來的需求。
參考資料
[1]https://hub.docker.com/r/macbre/sphinxsearch
[2] https://lukaszherok.com/post/view/9/Running%20SphinxSearch%20in%20Podman%20container
[3] https://github.com/sphinxsearch/sphinx/tree/master/api
[4] 官方說明文件 https://sphinxsearch.com/docs/sphinx3.html#indexes
sphinx.conf (3版) 可用的欄位 http://sphinxsearch.com/docs/current/confgroup-source.html
[5] sphinxQL 說明文件 http://sphinxsearch.com/docs/current.html#sphinxql-reference
[6] 每個資料型態的使用屬性 http://sphinxsearch.com/wiki/doku.php?id=fields_and_attributes