繁体   English   中英

ArangoSearch 在搜索前过滤记录

[英]ArangoSearch filter records before searching

arangosearch 的性能问题

我有像这样的文档集合:

{
  "passage": "Some long text",
  "meta": {
    "language": "en",
    "Region":"Asia Pacific"
  },
  "document_name": "my document.pdf"
}

现在,为了启用全文搜索,我创建了一个视图和链接配置,例如:

"links": {
    "my_coll": {
      "analyzers": [
        "myAnalyzer"
      ],
      "fields": {
        "passage": {"analyzers": [
        "myAnalyzer"
      ]}
      },
      "includeAllFields": false,
      "storeValues": "none",
      "trackListPositions": false
    }
  }

现在我想从段落中搜索,但要搜索特定的语言和地区

我的查询如下:

LET token = tokens("My text to be search", "myAnalyzer")
for docs in my_vw
    search analyzer(token any == docs.passage, "myAnalyzer")
    filter docs.meta.language=="en"
    filter docs.meta.Region=="Global"
    sort BM25(docs) desc
    limit 50
return {passage: docs.passage, score: BM25(docs)}

此查询大约需要 4 秒才能回答。 集合中有 3,227,261 个文档。

执行计划:

 Id   NodeType               Est.   Comment
  1   SingletonNode             1   * ROOT
  3   EnumerateViewNode   3227261     - FOR docs IN my_vw SEARCH ANALYZER(([ "my", "token" ] any == docs.`passage`), "myAnalyzer") LET #10 = BM25(docs)   /* view query */
  4   CalculationNode     3227261       - LET #2 = ((docs.`meta`.`language` == "en") && (docs.`meta`.`Region` == "myAnalyzer"))   /* simple expression */
  5   FilterNode          3227261       - FILTER #2
  9   SortNode            3227261       - SORT #10 DESC   /* sorting strategy: constrained heap */
 10   LimitNode                50       - LIMIT 0, 50
 11   CalculationNode          50       - LET #8 = { "passage" : docs.`passage`, "score" : #10 }   /* simple expression */
 12   ReturnNode               50       - RETURN #8

它首先选择所有文档,然后应用过滤器。 有没有办法先应用过滤器然后搜索?

你能帮助提高这个查询性能吗?

我建议你避免后过滤。 您最好使用调整后的定义来索引meta.languagemeta.language字段:

"links": {
    "my_coll": {
      "analyzers": [
        "myAnalyzer"
      ],
      "fields": {
        "passage": {"analyzers": [ "myAnalyzer" ]},
        "fields": { "meta" : { "fields" : { "language":{}, "Region":{} } } }
      },
      "includeAllFields": false,
      "storeValues": "none",
      "trackListPositions": false
    }
  }

然后,您可以将查询转换为:

LET token = tokens("My text to be search", "myAnalyzer")
for docs in my_vw
    search analyzer(token any == docs.passage, "myAnalyzer")
           AND docs.meta.language=="en"
           AND docs.meta.Region=="Global"
    sort BM25(docs) desc
    limit 50
return {passage: docs.passage, score: BM25(docs)}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM