简体   繁体   中英

Elasticsearch: Query or aggregation for determining if claim is eligible or ineligible

I'm trying to set up a system where I can determine if a claim is eligible or ineligible based on similar past claims. You can think of it as expensing a purchase.

Say I have the existing documents:

PUT claim/_bulk
{ "create": { } }
{ "company_id": "Google","category":"office_equipment", "description":"stand up desk", "status": "approved"}
{ "create": { } }
{ "company_id": "Google","category":"office_equipment", "description":"computer chair", "status": "approved"}
{ "create": { } }
{ "company_id": "Apple","category":"office_equipment", "description":"keyboard", "status": "approved"}
{ "create": { } }
{ "company_id": "Samsung","category":"office_equipment", "description":"ps4", "status": "rejected"}

If someone tries to file a new claim with these attributes:

description: "wooden desk"
category: "office_equipment"

What kind of query or aggregation would I need to determine whether that claim is eligible (aka status == "approved") or ineligible (aka status == "rejected")? Would there be a query that would return a confidence score?

I'm looking for something like this as the output:

status: "approved"
confidence: 0.8

And if the confidence is too low for a claim that has no significant relevance to any existing claims, it would just be

status: approved (or whatever)
confidence: 0

in which case I would just manually process it.

Tldr;

I feels like a simple boolean query should do the trick.

Solution

I assumed category would be an exact match. So I choose to use the filter (which is going to select documents but do not impact the score)

Then when for a should, which mean it will try to match but not necessarily all the match are going to be fulfilled, and each match is going to influence the score.

I also took the liberty to set a min_score to filter documents that are too low.

using the following query:

GET 75292303/_search
{
  "min_score": 0.8,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "category.keyword": "office_equipment"
          }
        }
      ],
      "should": [
        {
          "match": {
            "description": "wooden desk"
          }
        }
      ]
    }
  }
}

I got the following results.

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.93171775,
    "hits": [
      {
        "_index": "75292303",
        "_id": "KITtBoYBArbKoMpIpUdh",
        "_score": 0.93171775,
        "_source": {
          "company_id": "Google",
          "category": "office_equipment",
          "description": "stand up desk",
          "status": "approved"
        }
      }
    ]
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM