在 PostgreSQL 上使用 SQLAlchemy 創建全文搜索索引

Question

我需要使用 SQLAlchemy 在 Python 中創建 PostgreSQL 全文搜索索引。 這是我想要的 SQL：

CREATE TABLE person ( id INTEGER PRIMARY KEY, name TEXT );
CREATE INDEX person_idx ON person USING GIN (to_tsvector('simple', name));

現在如何在使用 ORM 時使用 SQLAlchemy 執行第二部分：

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)

Answer 1

您可以使用Index in __table_args__創建索引。 如果需要多個字段，我還使用一個函數來創建ts_vector以使其更加整潔和可重用。 像下面這樣的東西：

from sqlalchemy.dialects import postgresql

def create_tsvector(*args):
    exp = args[0]
    for e in args[1:]:
        exp += ' ' + e
    return func.to_tsvector('english', exp)

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)

    __ts_vector__ = create_tsvector(
        cast(func.coalesce(name, ''), postgresql.TEXT)
    )

    __table_args__ = (
        Index(
            'idx_person_fts',
            __ts_vector__,
            postgresql_using='gin'
        )
    )

更新：使用索引的示例查詢（根據評論更正）：

people = Person.query.filter(Person.__ts_vector__.match(expressions, postgresql_regconfig='english')).all()

Answer 2

@sharez的答案非常有用（尤其是當您需要連接索引中的列時）。 對於希望在單個列上創建 tsvector GIN 索引的任何人，您可以使用以下方法簡化原始答案方法：

from sqlalchemy import Column, Index, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    textsearch = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            func.to_tsvector('english', textsearch),
            postgresql_using='gin'
            ),
        )

請注意， __table_args__中Index(...)后面的逗號不是樣式選擇， __table_args__的值必須是元組、字典或None 。

如果您確實需要在多個列上創建 tsvector GIN 索引，這是使用text()實現的另一種方法。

from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

def to_tsvector_ix(*columns):
    s = " || ' ' || ".join(columns)
    return func.to_tsvector('english', text(s))

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    atext = Column(String)
    btext = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            to_tsvector_ix('atext', 'btext'),
            postgresql_using='gin'
            ),
        )

Answer 3

感謝這個問題和答案。

我想添加更多內容，以防人們使用 alembic 通過使用自動生成來管理版本，這似乎無法檢測到創建索引。

我們可能最終會編寫自己的修改腳本，看起來像這樣。

"""add fts idx

Revision ID: e3ce1ce23d7a
Revises: 079c4455d54d
Create Date: 

"""

# revision identifiers, used by Alembic.
revision = 'e3ce1ce23d7a'
down_revision = '079c4455d54d'

from alembic import op
import sqlalchemy as sa


def upgrade():
    op.create_index('idx_content_fts', 'table_name',
            [sa.text("to_tsvector('english', content)")],
            postgresql_using='gin')


def downgrade():
    op.drop_index('idx_content_fts')

Answer 4

@sharez 和@benvc 已經回答了這個問題。 不過，我需要讓它與重量一起工作。 這就是我根據他們的回答所做的：

from sqlalchemy import Column, func, Index, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.operators import op

CONFIG = 'english'

Base = declarative_base()

def create_tsvector(*args):
    field, weight = args[0]
    exp = func.setweight(func.to_tsvector(CONFIG, field), weight)
    for field, weight in args[1:]:
        exp = op(exp, '||', func.setweight(func.to_tsvector(CONFIG, field), weight))
    return exp

class Example(Base):
    __tablename__ = 'example'

    foo = Column(String)
    bar = Column(String)

    __ts_vector__ = create_tsvector(
        (foo, 'A'),
        (bar, 'B')
    )

    __table_args__ = (
        Index('my_index', __ts_vector__, postgresql_using='gin'),
    )

Answer 5

這里以前的答案有助於指出正確的方向。 下面是使用 ORM 方法和來自TSVectorType sqlalchemy-utils TSVectorType 幫助程序的提煉和簡化方法（這是非常基本的，如果需要可以簡單地復制/粘貼以避免外部依賴性https://sqlalchemy-utils.readthedocs.io/en/latest /_modules/sqlalchemy_utils/types/ts_vector.html ）：

在從源文本字段自動填充的 ORM 模型（聲明性）中定義`TSVECTOR`列 ( `TSVectorType` )

import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html


class MyModel(Base):
    __tablename__ = 'mymodel'
    id = sa.Column(sa.Integer, primary_key=True)
    content = sa.Column(sa.String, nullable=False)

    content_tsv = sa.Column(
        TSVectorType("content", regconfig="english"),
        sa.Computed("to_tsvector('english', \"content\")", persisted=True))
    #      ^-- equivalent for SQL:
    #   COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;

    __table_args__ = (
        # Indexing the TSVector column
        sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"), 
    )

有關使用 ORM 進行查詢的更多詳細信息，請參閱https://stackoverflow.com/a/73999486/11750716（SQLAlchemy SQLAlchemy 1.4和SQLAlchemy 2.0之間存在重要區別）。

在 PostgreSQL 上使用 SQLAlchemy 創建全文搜索索引

問題描述

5 個解決方案

解決方案1
32 已采納 2017-02-22 11:34:54

解決方案2
17 2018-11-08 23:11:35

解決方案3
11 2020-08-16 09:04:18

解決方案4
10 2020-01-02 18:26:06

解決方案5
0 2022-10-08 19:16:24

在從源文本字段自動填充的 ORM 模型（聲明性）中定義`TSVECTOR`列 ( `TSVectorType` )

在 PostgreSQL 上使用 SQLAlchemy 創建全文搜索索引

問題描述

5 個解決方案

解決方案1 32 已采納 2017-02-22 11:34:54

解決方案2 17 2018-11-08 23:11:35

解決方案3 11 2020-08-16 09:04:18

解決方案4 10 2020-01-02 18:26:06

解決方案5 0 2022-10-08 19:16:24

在從源文本字段自動填充的 ORM 模型（聲明性）中定義TSVECTOR列 ( TSVectorType )

解決方案1
32 已采納 2017-02-22 11:34:54

解決方案2
17 2018-11-08 23:11:35

解決方案3
11 2020-08-16 09:04:18

解決方案4
10 2020-01-02 18:26:06

解決方案5
0 2022-10-08 19:16:24

在從源文本字段自動填充的 ORM 模型（聲明性）中定義`TSVECTOR`列 ( `TSVectorType` )