简体   繁体   中英

Python Whoosh not accepting single character

I am trying to parse a query which has text plus number.

Example: Apple iphone 6 results in:

  Results for And([Term('title', u'apple'), Term('title', u'iphone')])

while Apple iphone 62 results in:

  Results for And([Term('title', u'apple'), Term('title', u'iphone'), Term('title', u'62')])

Why isn't it accepting single digit number?

All words with single-character is considered as stop words in Whoosh by default and ignored. This means all letters and digits are ignored.

stop words are words which are filtered out before or after processing of natural language data (text). (ref)

You can check that StopFilter has a minsize = 2 by default added to pre-defined set.

class whoosh.analysis.StopFilter(
        stoplist=frozenset(['and', 'is', 'it', 'an', 'as', 'at', 'have', 'in', 'yet', 'if', 'from', 'for', 'when', 'by', 'to', 'you', 'be', 'we', 'that', 'may', 'not', 'with', 'tbd', 'a', 'on', 'your', 'this', 'of', 'us', 'will', 'can', 'the', 'or', 'are']),
        minsize=2,
        maxsize=None,
        renumber=True,
        lang=None
        )

So You can resolve this issue by redefining your schema and removing the StopFilter or using it with minsize = 1 :

from whoosh.analysis import StandardAnalyzer
schema = Schema(content=TEXT(analyzer=StandardAnalyzer(stoplist=None)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM