简体   繁体   中英

indexing synonyms in ElasticSearch Python

Problem description

I want to run a query string like this for example :

{"query": {
    "query_string" : {
        "fields" : ["description"],
        "query" : "illegal~"
        }
     }
 }

I have a side synonyms.txt file that contains synonyms :

illegal, banned, criminal, illegitimate, illicit, irregular, outlawed, prohibited
otherWord, synonym1, synonym2...

I want to find all elements having any one of these synonyms.

What I tried

First I want to index those synonyms in my ES database.

I tried to run this query with curl :

curl -X PUT "https://instanceAdress.europe-west1.gcp.cloud.es.io:9243/app/kibana#/dev_tools/console/sources" -H 'Content-Type: application/json' -d' {
"settings": {
    "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "whitespace",
                    "filter" : ["synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "synonyms.txt"
                }
            }
        }
    }
}
}
'

but it doesn't work {"statusCode":404,"error":"Not Found"}

I then need to change my query so that it takes into account the synonyms but I have no idea how.

So my questions are :

  • How can I index my synonyms ?
  • How can I change my query so that it does the query for all synonyms ?
  • Is there any way to index them in Python ?

example of a get query using Python Elasticsearch

es = Elasticsearch(
    ['fullAdress.europe-west1.gcp.cloud.es.io'],
    http_auth=('login', 'password'),
    scheme="https",
    port=9243,
)
es.get(index="sources", doc_type='rcp', id="301495")

You can index using synonyms with Python by: First, create a token filter:

synonyms_token_filter = token_filter(
  'synonyms_token_filter',     # Any name for the filter
  'synonym',                   # Synonym filter type
  synonyms=your_synonyms       # Synonyms mapping will be inlined
)

And then create an analyzer:

custom_analyzer = analyzer(
  'custom_analyzer',
  tokenizer='standard',
  filter=[
    'lowercase',
    synonyms_token_filter
  ])

There's also a package for this: https://github.com/agora-team/elasticsearch-synonyms

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM