The situation is I load the df into es, the df has two columns:'url' & 'text'.
And I query the 'text' with values in a list which named 'forbidden_words'.
I wanna make the res can also present two columns, one is "url", one is "forbidden_words" which are searched in the text.
But the below code present that '_source' doesn't show anything...
If there is any help, would be many thanks!
for i in forbidden_words:
dsll = {
'query': {
'match': {
'text': i
}
},
"_source": {
"includes": forbidden_words,
# "excludes": []
}
}
res = es.search(index='test', body=dsll)
The result of res:
{'took': 25,
'timed_out': False,
'_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
'hits': {'total': 26,
'max_score': 3.211111,
'hits': [{'_index': 'test',
'_type': 'test',
'_id': 'ml5utHcBcazm5fCndKUY',
'_score': 3.211111,
'_source': {}}, ....
{'_index': 'test',
'_type': 'test',
'_id': 'oV5utHcBcazm5fCndKUY',
'_score': 1.2800283,
'_source': {}}]}}
_source
expects field names , not field values. So you could only say:
{
"query": {
"match": {
"text": "xyz"
}
},
"_source": {
"includes": ["text", "url"] <--
}
}
If you only want to return the actual words that matched your query, take a look at highlighting :
{
"query": {
"match": {
"text": "xyz"
}
},
"_source": {
"includes": ["text", "url"]
},
"highlight": {
"fields": {
"text": {}
}
}
}
Note that the highlighted values will not be inside of the _source
anymore but inside highlight
.
result = []
for i in forbidden_words:
dsl = {
'query': {
'match_phrase': {
'text': i
}
}
}
res = es.search(index='cn_web_crawler_test_linux', body=dsl, size=sizee)
for j in res["hits"]["hits"]:
if len(j) > 0:
append_list = (i, j['_source']['url'], j['_source']['text'])
result.append(append_list)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.