简体   繁体   中英

Registering a new FTS tokenizer in SQLite3 w/ Python

I'm building an application which requires a custom Tokenizer in its FTS database. I have found a Tokenizer which does what I want ( this one ), but I can't find any directions for registering and using custom Tokenizers with the SQLite in Python.

Anyone have an idea on how to proceed?

If you are willing to write your tokenizer in python, you could use sqlitefts .

Here's an example tokenizer from the README:

import sqlitefts as fts

class SimpleTokenizer(fts.Tokenizer):
    _p = re.compile(r'\w+', re.UNICODE)

    def tokenize(self, text):
        for m in self._p.finditer(text):
            s, e = m.span()
            t = text[s:e]
            l = len(t.encode('utf-8'))
            p = len(text[:s].encode('utf-8'))
            yield t, p, p + l

tk = sqlitefts.make_tokenizer_module(SimpleTokenizer())
fts.register_tokenizer(conn, 'simple_tokenizer', tk)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM