[Rocky9] Sphinx 3支援中文 + Docker + PHP7

URL Link //n.sfs.tw/16422

2024-06-09 13:23:10 By 張○○

SphinxSearch是一套非常棒的搜尋引擎,以下簡稱為sphinx,可支援 mysql 及中文語系,目前最新版已到3.7.1(2024.4.14)

在前陣子曾寫過一篇 [Rocky9] 安裝SPHINX Search 3 支援中文 @新精讚  ,這個是v3.6.1且直接裝在os中,有需要可以參考這篇。

原本預計更換伺服器時,想說裝原生的服務(service)並且都已經建置完成。但後來又覺得如果只有搜尋引擎放在docker外,其它放在docker內的話,感覺不是那麼「模組」,同時以後更新的話就很麻煩,所以只好又花了八個小時的研究才完成這篇文章。

使用docker安裝上簡單,但是設定、索引、php的程序比起服務有較多的差別,花了很多時間才攻克。

OS

Rocky Linux release 9.4 (Blue Onyx)
 

一、Docker 設置

Docker Image

docker image 使用 macbre/sphinxsearch [1]

 

Docker compose

services:
  sphinx:
    image: macbre/sphinxsearch:3.7.1
    container_name: sphinx
    restart: always
    volumes:
      - ./sphinx/data:/opt/sphinx/index  # directory where sphinx will store index data
      - ./sphinx/sphinx.conf:/opt/sphinx/conf/sphinx.conf  # SphinxSE configuration file
#    mem_limit: 512m # match indexer.value from sphinx.conf
#    command:
#       indexer --all --config /opt/sphinx/conf/sphinx.conf

    ports:
      - 9306:9306
    networks:
      - mynet

networks:
  mynet:
    driver: bridge

目錄結構

docker-compose.yml
└─ sphinx
      ├── data  <== 目前是空的,你可以交給docker自行建立
      └── sphinx.conf

其中 sphinx.conf 需先建立完畢,寫在下方。

command 這行很關鍵,得在第一次啟動時使用,然後註解,否則會出現類似以下的錯誤 [2]

WARNING: index 'test1': prealloc: failed to open /opt/sphinx/index/test1.sph: No such file or directory; NOT SERVING error.

這並不是權限上的錯誤,而是沒有進行初始化先建立索引。事實上,我發現只有採用 plain indexes的話才需要設個步驟,如果你要採用 rt indexs的話,可以直接啟動。

ports 指定9306為了方便查錯,建立完畢後可以直接mysql來查詢。

 

二、設定索引檔

 

修改 sphinx.conf 並加入中文的支援,以下範例是將資料表的每個欄位都做索引,僅供參考

 

source base
{
        type                    = mysql
        sql_host                = 192.168.1.51  <== 設定你的ip
        sql_user                = note
        sql_pass                = yourpassword 
        sql_db                  = note
        sql_port                = 3306
        sql_query_pre           = SET NAMES utf8
}
source content:base
{
    sql_query               = \
      SELECT nID, nID as theid, sn, title, keyword, remark, c.text, 'content' as module, c.is_closed, c.o_uid, c.catID, a.text as catname, u.site_name, u.site_prefix \
      FROM content c left join catalog a using( catID) left join user_basic u on  c.o_uid= u.uid

}

common
{
    datadir = /opt/sphinx/index/
}

index content
{
    field_string = theid
    field_string = title
    field_string = keyword
    field_string = remark
    field_string = text
    attr_uint =  o_uid
    attr_uint = catID
    attr_uint = is_closed
    attr_string = sn
    attr_string = module
    attr_string = site_name
    attr_string = catname
    attr_string = site_prefix
    type         = plain
    source       = content
#    path         = /opt/sphinx/index/content  <== 這行請移除或註解
    html_strip   = 1
    min_word_len = 2
    charset_table = 0..9, A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00C0..U+00D6->U+00E0..U+00F6, U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101, U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109, U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F, U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117, U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D, U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135, U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C, U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144, U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B, U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153, U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159, U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161, U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167, U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F, U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175, U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C, U+017C, U+017D->U+017E, U+017E, U+0410..U+042F->U+0430..U+044F, U+0430..U+044F, U+05D0..U+05EA, U+0531..U+0556->U+0561..U+0586, U+0561..U+0587, U+0621..U+063A, U+01B9, U+01BF, U+0640..U+064A, U+0660..U+0669, U+066E, U+066F, U+0671..U+06D3, U+06F0..U+06FF, U+0904..U+0939, U+0958..U+095F, U+0960..U+0963, U+0966..U+096F, U+097B..U+097F, U+0985..U+09B9, U+09CE, U+09DC..U+09E3, U+09E6..U+09EF, U+0A05..U+0A39, U+0A59..U+0A5E, U+0A66..U+0A6F, U+0A85..U+0AB9, U+0AE0..U+0AE3, U+0AE6..U+0AEF, U+0B05..U+0B39, U+0B5C..U+0B61, U+0B66..U+0B6F, U+0B71, U+0B85..U+0BB9, U+0BE6..U+0BF2, U+0C05..U+0C39, U+0C66..U+0C6F, U+0C85..U+0CB9, U+0CDE..U+0CE3, U+0CE6..U+0CEF, U+0D05..U+0D39, U+0D60, U+0D61, U+0D66..U+0D6F, U+0D85..U+0DC6, U+1900..U+1938, U+1946..U+194F, U+A800..U+A805, U+A807..U+A822, U+0386->U+03B1, U+03AC->U+03B1, U+0388->U+03B5, U+03AD->U+03B5, U+0389->U+03B7, U+03AE->U+03B7, U+038A->U+03B9, U+0390->U+03B9, U+03AA->U+03B9, U+03AF->U+03B9, U+03CA->U+03B9, U+038C->U+03BF, U+03CC->U+03BF, U+038E->U+03C5, U+03AB->U+03C5, U+03B0->U+03C5, U+03CB->U+03C5, U+03CD->U+03C5, U+038F->U+03C9, U+03CE->U+03C9, U+03C2->U+03C3, U+0391..U+03A1->U+03B1..U+03C1, U+03A3..U+03A9->U+03C3..U+03C9, U+03B1..U+03C1, U+03C3..U+03C9, U+0E01..U+0E2E, U+0E30..U+0E3A, U+0E40..U+0E45, U+0E47, U+0E50..U+0E59
    ngram_len = 1
    ngram_chars = U+4E00..U+9FBF, U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF, U+2F800..U+2FA1F, U+2E80..U+2EFF, U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF, U+3040..U+309F, U+30A0..U+30FF, U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF, U+3130..U+318F, U+A000..U+A48F, U+A490..U+A4CF
}

searchd
{
    listen          = 9312
    listen          = 9306:mysql41
    log             = /opt/sphinx/index/searchd.log
    query_log       = /opt/sphinx/index/query.log
    read_timeout    = 5
    max_children    = 30
    pid_file        = /opt/sphinx/index/searchd.pid
    seamless_rotate = 1
    preopen_indexes = 1
    unlink_old      = 1
    workers         = threads # for RT to work
    binlog_path     = /opt/sphinx/index
}

上面路徑要設定正確,資料庫請先確定可以連結。

common 這個區塊指定索引建立的目錄即可

 

三、初次啟動

初次啟動時 sphinx.conf 必需設置正確,否則docker無法啟動。

請將上面 docker-compose.yml 的中 command 欄位註解取消,像這樣:

    command:
       indexer --all --config /opt/sphinx/conf/sphinx.conf

    ports:
      - 9306:9306

啟動 docker container

# docker compose up -d sphinx

如果成功會像這樣:

# docker ps -a
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS          PORTS                                                                      NAMES
053e2c358921   macbre/sphinxsearch:3.7.1   "/bin/sh -c 'sh /opt…"   5 seconds ago       Up 3 seconds       0.0.0.0:9306->9306/tcp, :::9306->9306/tcp, 36307/tcp

然後請勿留戀請立刻移除container

# docker compose down sphinx

成功啟動後再把 command 再註解起來,才不會一直建索引,否則服務會一直重啟導致失敗。

目錄 data 中會出現很多檔案,這樣就是正確的:

sphinx
├── data
│   ├── binlogs
│   │   ├── content.binlog.0000
│   │   ├── content.binlog.lock
│   │   ├── content.binlog.meta
│   │   ├── media.binlog.0000
│   │   ├── media.binlog.lock
│   │   └── media.binlog.meta
│   ├── extra
│   ├── indexes
│   │   └─ content
│   │      ├── content.spa
│   │      ├── content.spb
│   │      ├── content.spc
│   │      ├── content.spd
│   │      ├── content.spe
│   │      ├── content.sph
│   │      ├── content.spi
│   │      ├── content.spj
│   │      ├── content.spk
│   │      ├── content.spl
│   │      └── content.spp
│   ├── logs
│   │   ├── query.log
│   │   └── searchd.log
│   ├── plugins
│   └── searchd.pid

└── sphinx.conf

以上綠色的部分是自動產生的,一切ok。

查看log發現

# docker logs sphinx
Sphinx 3.7.1 (commit da9f8a4e7)
Copyright (c) 2001-2024, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
[Sun Jun  9 06:56:09.100 2024] [1:1] using config file '/opt/sphinx/conf/sphinx.conf'...
[Sun Jun  9 06:56:09.101 2024] [1:1] WARNING: datadir: invalid 'log' value; must be empty or 'no'; log stays enabled
[Sun Jun  9 06:56:09.101 2024] [1:1] WARNING: datadir: invalid 'query_log' value; must be empty or 'no'; log stays enabled
[Sun Jun  9 06:56:09.101 2024] [1:1] WARNING: datadir: 'binlog_path' value ignored in datadir mode
[Sun Jun  9 06:56:09.101 2024] [1:1] WARNING: datadir: 'pid_file' value ignored in datadir mode

listening on all interfaces, port=9312
listening on all interfaces, port=9306
loading 2 indexes...
accepting connections

再修改 sphinx.conf  把上面的請個值註解(或刪除):

searchd
{
    listen          = 9312
    listen          = 9306:mysql41
#   log             = /opt/sphinx/index/searchd.log
#   query_log       = /opt/sphinx/index/query.log

    read_timeout    = 5
    max_children    = 30
#   pid_file        = /opt/sphinx/index/searchd.pid
    seamless_rotate = 1
    preopen_indexes = 1
    unlink_old      = 1
    workers         = threads # for RT to work
#   binlog_path     = /opt/sphinx/index
}

再重新啟動就ok了

查看版本

# docker exec -t sphinx searchd --version
Sphinx 3.7.1 (commit da9f8a4e7)
Copyright (c) 2001-2024, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

Built on: #1-Alpine SMP Tue, 19 Jul 2022 15:30:18 +0000
Built with: GNU 10.3.1
Build date: Mar 28 2024
Build type: release
Configure flags: cmake
Compiled DB drivers: mysql-dynamic pgsql-dynamic odbc-dynamic
Compiled features: libexpat libstemmer re2 jemalloc
Versions: binlog_format v.11, index_format v.63, udf_api v.23

 

四、建立索引

手動建立索引

#  docker exec -t sphinx indexer --rotate --all --config /opt/sphinx/conf/sphinx.conf

Sphinx 3.7.1 (commit da9f8a4e7)
Copyright (c) 2001-2024, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/opt/sphinx/conf/sphinx.conf'...
indexing index 'content'...
collected 3350 docs, 13.6 MB
sorted 3.7 Mhits, 100.0% done
total 3350 docs, 13.59 Mb
total 2.3 sec, 5.947 Mb/sec, 1465 docs/sec

rotating indices: successfully sent SIGHUP to searchd (pid=1).

自動執行建立排程

# 每天四次做索引
10 2,11,16,21 * * * /usr/bin/docker exec -t sphinx indexer --rotate --all --config /opt/sphinx/conf/sphinx.conf > /tmp/sphinxsearch-indexer.log 2>&1

 

 

五、手動測試查詢

使用Mysql去查詢sphinx的搜尋結果,例如在test1索引中找具有'test'的字串

# mysql -h 127.0.0.1 -P9306
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 3.7.1 (commit da9f8a4e7)
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
mysql> show tables;
+---------+-------+
| Index   | Type  |
+---------+-------+
| content | local |
+---------+-------+
1 rows in set (0.00 sec)
mysql> desc content;
+-------------+--------------+------------+------+
| Field       | Type         | Properties | Key  |
+-------------+--------------+------------+------+
| id          | bigint       |            |      |
| theid       | field_string | indexed    |      |
| title       | field_string | indexed    |      |
| keyword     | field_string | indexed    |      |
| remark      | field_string | indexed    |      |
| text        | field_string | indexed    |      |
| o_uid       | uint         |            |      |
| catid       | uint         |            |      |
| is_closed   | uint         |            |      |
| sn          | string       |            |      |
| module      | string       |            |      |
| site_name   | string       |            |      |
| catname     | string       |            |      |
| site_prefix | string       |            |      |
+-------------+--------------+------------+------+
14 rows in set (0.00 sec)

用關鍵字查詢

mysql> SELECT sn, id, title FROM content WHERE MATCH('sphinx');
+--------------+-------+-----------------------------------------------+
| sn           | id    | title                                         |
+--------------+-------+-----------------------------------------------+
| 240119030359 | 16331 | [Rocky9] 安裝SPHINX Search 3 支援中文         |
| 161023012236 | 10058 | 安裝SPHINX支援中文                            |
| 161023015030 | 10059 | PHP for sphinx 函式庫安裝                     |
| 170512134862 | 11054 | [Centos7] 安裝sphinx+php7                     |
| 240609132314 | 16422 | [Rocky9] Sphinx 3支援中文 + Docker + PHP7     |
| 170718172131 | 11465 | [Mysql/Mariadb] 全文檢索 fulltext index       |
| 240118021319 | 16325 | 下一代精讚開發規畫                            |
+--------------+-------+-----------------------------------------------+
7 rows in set (0.00 sec)

 

php 中可以使用 [3] 提供的 api。

簡單結論一下,在docker上的設置似乎更適合未來的需求。

 

參考資料

[1]https://hub.docker.com/r/macbre/sphinxsearch

[2] https://lukaszherok.com/post/view/9/Running%20SphinxSearch%20in%20Podman%20container

[3] https://github.com/sphinxsearch/sphinx/tree/master/api

[4] 官方說明文件 https://sphinxsearch.com/docs/sphinx3.html#indexes
     sphinx.conf (3版) 可用的欄位 http://sphinxsearch.com/docs/current/confgroup-source.html

[5] sphinxQL 說明文件 http://sphinxsearch.com/docs/current.html#sphinxql-reference

[6] 每個資料型態的使用屬性 http://sphinxsearch.com/wiki/doku.php?id=fields_and_attributes