简体   繁体   中英

Elastic Search - Sorting & Filtering on nested Documents

I am working on an E-Commerce application. Catalog Data is being served by Elastic Search. I have document's for Product which is already indexed in Elastic Search.

Document Looks something like this (Excluded few fields for the purpose of better readability):

{
     "title" : "Product Name",
      "volume" : "200gm",
      "brand" : {
        "brand_code" : XXXX,
        "brand_name" : "Brand Name"
      },
      "@timestamp" : "2021-08-26T08:08:11.319Z",
      "store" : [
        {
          "physical_unit" : 0,
          "default_price" : 115.0,
          "_id" : "1234_111",
          "product_code" : "1234",
          "warehouse_code" : 111,
          "available_unit" : 100
        }
      ],
      "category" : {
        "category_code" : 987,
        "category_name" : "CategoryName",
        "category_url_link" : "CategoryName",
        "super_category_name" : "SuperCategoryName",
        "parent_category_name" : "ParentCategoryName"
      }
    }

store object in the above document is the one where ES Query will look for price and to decide if item is in stock or Out Of Stock.

I would like to add more child objects to store (Basically data from multiple inventory). This can go up to more than 150 child objects for each product.

Eventually, A product document will look something like this with multiple inventory's data mapped to a particular document.

{
     "title" : "Product Name",
      "volume" : "200gm",
      "brand" : {
        "brand_code" : XXXX,
        "brand_name" : "Brand Name"
      },
      "@timestamp" : "2021-08-26T08:08:11.319Z",
      "store" : [
        {
          "physical_unit" : 0,
          "default_price" : 115.0,
          "_id" : "1234_111",
          "product_code" : "1234",
          "warehouse_code" : 111,
          "available_unit" : 100
        },
        {
          "physical_unit" : 0,
          "default_price" : 125.0,
          "_id" : "1234_112",
          "product_code" : "1234",
          "warehouse_code" : 112,
          "available_unit" : 100
        },
        {
          "physical_unit" : 0,
          "default_price" : 105.0,
          "_id" : "1234_113",
          "product_code" : "1234",
          "warehouse_code" : 113,
          "available_unit" : 100
        }
        Upto N no of stores
      ],
      "category" : {
        "category_code" : 987,
        "category_name" : "CategoryName",
        "category_url_link" : "CategoryName",
        "super_category_name" : "SuperCategoryName",
        "parent_category_name" : "ParentCategoryName"
      }
    }

Functional Requirement:

  1. For any product, we should show lowest price across all warehouse. For EX: If a particular product has 50 store mapped to it, Elastic Search query should look into the nested object and get the value which is lowest in all 50 stores if item is available.
  2. Performance should not be degraded.

Challenges:

  1. If we start storing those many stores for each product, data will go considerably high. Will that be a problem?
  2. What would be the efficient way to extract the lowest price from nested document?
  3. How would facets work within nested document? Like if i apply price range filter ES picks up the data which was not showed earlier. (It might pick the data from other store which matches the range)

We are using template to query ES and the Version of the Elastic Search is 6.0. Thanks in Advance!!

First there are improvements to nested document search in version 7.x that are worth the upgrade.

As for version 6.x, there are a lot of factors there that I could not give you a concrete answer. It also seems you may not be understanding the way that nested documents work, they are not relational.

In particular when you say that each product might have 50 stores mapped to it that sounds like you are implying a relationship, which will not exist with a nested document. However, the values from those 50 stores would be stored within an index nested under the parent document. Having 50 stores under a product or category does not sound concerning.

ElasticSearch has not really talked in terms of facets since the introduction of the aggregation framework. Its not that they dont exist, just not how they are discussed.

So lets try this. ElasticSearch optimizes its search and query through a divide and conquer mechanism. The data is spread across several shards, a configurable number, and each shard is responsible for reviewing its own data. Further, those shards can be distributed across many machines so that there are many cpus and lots of memory for the search. So growing the data doesn't matter if you are willing to grow the cluster, as it is possible to maintain a situation where each machine is doing the same amount of work as it was doing before.

Unlike a relational database, filters search terms allow Elastic to drastically reduce the data that it is looking at and a larger number of filters will improve performance where on a relational database performance declines.

Now back to nested documents. They are stored as a separate index, but instead of mapping the results to the nested doc, the results map to the parent doc id. So you're nested docs arent exactly in the same index as the rest of the document, though they are not truly separate either. But that does mean that the nested documents should have minimal impact the performance of the queries against the parent documents. But if your data size grows beyond the capacity of your current system you will still need to increase its size.

As to how you would query, you would use Elastic aggregations. These will allow you to calculate your "facet" counts and identify the best prices. The Elastic aggregations are very powerful and very fast. There are caveats that are well documented, but in general they will work as you expect.

In version 6.x query string queries cannot access the search criteria in a nested document, and a complex query must be used.

To recap

Functional Requirement:

  1. For any product, we should show lowest price across all warehouse. For EX: If a particular product has 50 store mapped to it, ElasticSearch query should look into the nested object and get the value which is lowest in all 50 stores if item is available.

Yes a nested aggregation will do this.

  1. Performance should not be degraded.

Performance will continue to depend on the ratio of the size of the data to the overall cluster size.

Challenges:

If we start storing those many stores for each product, data will go considerably high. Will that be a problem?

No this should not be a problem

What would be the efficient way to extract the lowest price from nested document?

Elastic Aggregations

How would facets work within nested document? Like if i apply price range filter ES picks up the data which was not showed earlier. (It might pick the data from other store which matches the range)

Yes filtering can work with Aggregations very well. The aggregation will be based on the filtered data. In fact you could have an aggregation based on just minimum price, and in the same query then have an aggregation using your price ranges, which will give you the count of documents that have a store within that price range, and you could have a sub aggregation showing the stores under each price range.

We are using template to query ES and the Version of the Elastic Search is 6.0. Thanks in Advance!!

I know nothing about template. The ElasticSearch API is so dead simple I do not know why anyone uses additional tools on top of the API, they just add weight, and increase complexity and make key features not available because the wrapper author did not pass through the feature.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM