Elasticsearch: aggregate by similar substrings

Question

I have a index of documents with only one property each. The records are like

Products Sport
Products Health
Products Home
Questions CSS
Questions HTML
Questions JS

There are a lot of documents an a lot of duplicates. The question is can I somehow group them by "similarity" (in any sense) and add the "common part" to each document, so I will have something like

Products Sport         Products
Products Health        Products
Products Home          Products
Questions CSS          Questions
Questions HTML         Questions
Questions JS           Questions

It's just for analysis purposes, so it can be very inaccurate, but should be quick enough.

Answer 1

What you are looking for is _update_by_query. Something like this for each category to add a field named category and set it's value using scripts

POST index/_update_by_query? conflicts=proceed
{
  "script": {
   "source": "ctx._source['category']='Products'",
    "lang": "painless"
 },
  "query": {
    "exists": {
      "field": "Products"
    }
  }
}

Alternative: If you are looking to just perform group by for results, then you can use the exists query clause to get the documents of certain type and then perform aggregations on them with out updating the documents

Elasticsearch: aggregate by similar substrings

Question

1 answers

solution1
0 2018-03-15 15:33:39

Elasticsearch: aggregate by similar substrings

Question

1 answers

solution1 0 2018-03-15 15:33:39

solution1
0 2018-03-15 15:33:39