嗖的一聲與短語完全匹配

Question

我想在文檔中找到一個短語，我在快速入門中使用了這些代碼。

>>> from whoosh.index import create_in
>>> from whoosh.fields import *
>>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
>>> ix = create_in("indexdir", schema)
>>> writer = ix.writer()
>>> writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!")
>>> writer.add_document(title=u"Second document", path=u"/b",  content=u"The second one is even more interesting!")
>>> writer.commit()
>>> from whoosh.qparser import QueryParser
>>> with ix.searcher() as searcher:
        query = QueryParser("content", ix.schema).parse("first")
        results = searcher.search(query)
        results[0]

    result: {"title": u"First document", "path": u"/a"}

但后來我發現他們會將關鍵詞分成幾個單詞，然后搜索文檔。 如果我想搜索“文檔中的第一個人”之類的短語，我該怎么辦。

在文件上，它說，使用

“這是一個短語”

如果我想搜索：

這是一個短語。

這讓我很困惑。

此外，這是一個類似乎可以幫助我，但我不知道如何使用它。

class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)
 Matches documents containing a given phrase.

更新：我以這種方式使用它，但沒有匹配。

from whoosh.index import create_in
from whoosh.fields import *
schema = Schema(title=TEXT(stored=True), path=ID(stored=True),   content=TEXT)
ix = create_in("indexdir", schema)
writer = ix.writer()
writer.add_document(title=u"First document", path=u"/a",
                 content=u"This is the first document we've added!")
writer.add_document(title=u"Second document", path=u"/b",
               content=u"The second one is even more interesting!")
writer.commit()
from whoosh.query import Phrase

a = Phrase("content", u"the first")

results = ix.searcher().search(a)
print results

結果：

排名前0的結果短語（'content'，u'the first'，slop = 1，boost = 1.000000）runtime = 0.0>

根據其他更新

with ix.searcher() as searcher:
    query = QueryParser("content", ix.schema).parse(**'"first x document"'**)
results = searcher.search(query)
print results[0]

結果：點擊{'內容'：u“這是我們添加的第一個文檔！”，“路徑”：u'/ a'，'title'：u'First document'}>

我認為應該沒有匹配的結果，因為文檔中沒有“第一個x文檔”。 否則，它不是完全匹配。

Answer 1

你應該給Phrase一個單詞list而不是一個字符串作為第二個參數，並且還要刪除它，因為它是一個停用詞：

a = Phrase("content", [u"first",u"document"])

代替

a = Phrase("content", u"the first")

讀入文檔：

 class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None) Matches documents containing a given phrase. 
參數：

fieldname - 要搜索的字段。

words - 短語中的單詞列表（unicode strings）。

通過在QueryParser使用引號 " " ，在whoosh中自然使用短語搜索：

>>> with ix.searcher() as searcher:
        query = QueryParser("content", ix.schema).parse('"first document"')
        results = searcher.search(query)
        results[0]

更新：對於"first x document"匹配的內容，這是因為x和所有單字符單詞都是停用詞並被過濾。

Answer 2

要在內容中查找短語，請在定義Schema時使用phrase=True ，如下所示

schema = Schema(title=TEXT(stored=True), content=TEXT(phrase=True))

然后簡單地在單個引號中使用雙引號，如下所示

query = QueryParser("content", schema=ix.schema).parse('"exact phrase"')

嗖的一聲與短語完全匹配

問題描述

2 個解決方案

解決方案1
2 已采納 2015-10-21 21:46:50

解決方案2
1 2017-11-06 21:40:56

嗖的一聲與短語完全匹配

問題描述

2 個解決方案

解決方案1 2 已采納 2015-10-21 21:46:50

解決方案2 1 2017-11-06 21:40:56

解決方案1
2 已采納 2015-10-21 21:46:50

解決方案2
1 2017-11-06 21:40:56