简体   繁体   中英

how can I get the searched string from list in my es.search result?

The situation is I load the df into es, the df has two columns:'url' & 'text'.

And I query the 'text' with values in a list which named 'forbidden_words'.

I wanna make the res can also present two columns, one is "url", one is "forbidden_words" which are searched in the text.

But the below code present that '_source' doesn't show anything...

If there is any help, would be many thanks!

for i in forbidden_words:
    dsll = {
       'query': {
           'match': {
               'text': i
               }
           },
       "_source": {
           "includes": forbidden_words,
           # "excludes": []
           }
       }
res = es.search(index='test', body=dsll) 

The result of res:

    {'took': 25,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': 26,
  'max_score': 3.211111,
  'hits': [{'_index': 'test',
    '_type': 'test',
    '_id': 'ml5utHcBcazm5fCndKUY',
    '_score': 3.211111,
    '_source': {}}, ....
   {'_index': 'test',
    '_type': 'test',
    '_id': 'oV5utHcBcazm5fCndKUY',
    '_score': 1.2800283,
    '_source': {}}]}}

_source expects field names , not field values. So you could only say:

{
  "query": {
    "match": {
      "text": "xyz"
    }
  },
  "_source": {
    "includes": ["text", "url"]    <--
  }
}

If you only want to return the actual words that matched your query, take a look at highlighting :

{
  "query": {
    "match": {
      "text": "xyz"
    }
  },
  "_source": {
    "includes": ["text", "url"]
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}

Note that the highlighted values will not be inside of the _source anymore but inside highlight .

result = []
for i in forbidden_words:
    dsl = {   
       'query': {
           'match_phrase': {  
               'text': i
               }
           }
       }
    res = es.search(index='cn_web_crawler_test_linux', body=dsl, size=sizee)
    for j in res["hits"]["hits"]:
        if len(j) > 0:
            append_list = (i, j['_source']['url'], j['_source']['text'])
        result.append(append_list)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM