[英]ArangoSearch filter records before searching
arangosearch 的性能问题
我有像这样的文档集合:
{
"passage": "Some long text",
"meta": {
"language": "en",
"Region":"Asia Pacific"
},
"document_name": "my document.pdf"
}
现在,为了启用全文搜索,我创建了一个视图和链接配置,例如:
"links": {
"my_coll": {
"analyzers": [
"myAnalyzer"
],
"fields": {
"passage": {"analyzers": [
"myAnalyzer"
]}
},
"includeAllFields": false,
"storeValues": "none",
"trackListPositions": false
}
}
现在我想从段落中搜索,但要搜索特定的语言和地区
我的查询如下:
LET token = tokens("My text to be search", "myAnalyzer")
for docs in my_vw
search analyzer(token any == docs.passage, "myAnalyzer")
filter docs.meta.language=="en"
filter docs.meta.Region=="Global"
sort BM25(docs) desc
limit 50
return {passage: docs.passage, score: BM25(docs)}
此查询大约需要 4 秒才能回答。 集合中有 3,227,261 个文档。
执行计划:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
3 EnumerateViewNode 3227261 - FOR docs IN my_vw SEARCH ANALYZER(([ "my", "token" ] any == docs.`passage`), "myAnalyzer") LET #10 = BM25(docs) /* view query */
4 CalculationNode 3227261 - LET #2 = ((docs.`meta`.`language` == "en") && (docs.`meta`.`Region` == "myAnalyzer")) /* simple expression */
5 FilterNode 3227261 - FILTER #2
9 SortNode 3227261 - SORT #10 DESC /* sorting strategy: constrained heap */
10 LimitNode 50 - LIMIT 0, 50
11 CalculationNode 50 - LET #8 = { "passage" : docs.`passage`, "score" : #10 } /* simple expression */
12 ReturnNode 50 - RETURN #8
它首先选择所有文档,然后应用过滤器。 有没有办法先应用过滤器然后搜索?
你能帮助提高这个查询性能吗?
我建议你避免后过滤。 您最好使用调整后的定义来索引meta.language
和meta.language
字段:
"links": {
"my_coll": {
"analyzers": [
"myAnalyzer"
],
"fields": {
"passage": {"analyzers": [ "myAnalyzer" ]},
"fields": { "meta" : { "fields" : { "language":{}, "Region":{} } } }
},
"includeAllFields": false,
"storeValues": "none",
"trackListPositions": false
}
}
然后,您可以将查询转换为:
LET token = tokens("My text to be search", "myAnalyzer")
for docs in my_vw
search analyzer(token any == docs.passage, "myAnalyzer")
AND docs.meta.language=="en"
AND docs.meta.Region=="Global"
sort BM25(docs) desc
limit 50
return {passage: docs.passage, score: BM25(docs)}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.