如何從 es.search 結果中的列表中獲取搜索到的字符串？

Question

情況是我將df加載到es中，df有兩列：'url'和'text'。

我用名為“forbidden_words”的列表中的值查詢“文本”。

我想讓 res 也可以呈現兩列，一是“url”，一是在文本中搜索的“forbidden_words”。

但是下面的代碼表明'_source'沒有顯示任何東西......

如果有任何幫助，將非常感謝！

for i in forbidden_words:
    dsll = {
       'query': {
           'match': {
               'text': i
               }
           },
       "_source": {
           "includes": forbidden_words,
           # "excludes": []
           }
       }
res = es.search(index='test', body=dsll)

res的結果：

    {'took': 25,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': 26,
  'max_score': 3.211111,
  'hits': [{'_index': 'test',
    '_type': 'test',
    '_id': 'ml5utHcBcazm5fCndKUY',
    '_score': 3.211111,
    '_source': {}}, ....
   {'_index': 'test',
    '_type': 'test',
    '_id': 'oV5utHcBcazm5fCndKUY',
    '_score': 1.2800283,
    '_source': {}}]}}

Answer 1

_source需要字段名稱，而不是字段值。 所以你只能說：

{
  "query": {
    "match": {
      "text": "xyz"
    }
  },
  "_source": {
    "includes": ["text", "url"]    <--
  }
}

如果您只想返回與查詢匹配的實際單詞，請查看highlighting ：

{
  "query": {
    "match": {
      "text": "xyz"
    }
  },
  "_source": {
    "includes": ["text", "url"]
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}

請注意，突出顯示的值將不再位於_source內部，而是位於highlight內部。

Answer 2

result = []
for i in forbidden_words:
    dsl = {   
       'query': {
           'match_phrase': {  
               'text': i
               }
           }
       }
    res = es.search(index='cn_web_crawler_test_linux', body=dsl, size=sizee)
    for j in res["hits"]["hits"]:
        if len(j) > 0:
            append_list = (i, j['_source']['url'], j['_source']['text'])
        result.append(append_list)

如何從 es.search 結果中的列表中獲取搜索到的字符串？

問題描述

2 個解決方案

解決方案1
0 2021-02-19 11:53:44

解決方案2
0 2021-02-20 08:43:13

如何從 es.search 結果中的列表中獲取搜索到的字符串？

問題描述

2 個解決方案

解決方案1 0 2021-02-19 11:53:44

解決方案2 0 2021-02-20 08:43:13

解決方案1
0 2021-02-19 11:53:44

解決方案2
0 2021-02-20 08:43:13