简体   繁体   中英

What should I do if I need special analyzer in ElasticSearch

In my textual data, I have structures like this:

ст.ст.40, 131, 132, 176-178, 183, ч. 2 ст. 187, 188, 184, 189, 194 KK

Where KK is the name of a codex, ст. ст., or ст. mean article, ч. mean part. I want Elasticsearch to find a similar string using a regular expression and execute a script to process this string so that I can get tokens like these

40 KK, 131 KK, ..... 194 KK.

How can I get it in Elasticsearch?

I think it is possible to improve this script I wrote. You would have to invoke it at indexing time to get the formatted data.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "description": "Sample handle text",
          "lang": "painless",
          "source": """
            String[] envSplit = ctx['env'].splitOnToken(',');
            ArrayList tags = new ArrayList();
            for(int i = 0; i< envSplit.length; i++) {
              String value = envSplit[i];
              if(!value.contains('KK')) {
               tags.add(value.replace('ч. 2', '')
                .replace('ст. ', '') 
                + ' KK');
              } else {
                tags.add(envSplit[i]);
              }
            }
            ctx['tags'] = tags;
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "env": "ст. ст. 40, 131, 132, 176-178, 183, ч. 2 ст. 187, 188, 184, 189, 194 KK"
      }
    }
  ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM