还有其他方法可以针对JSON中的多个嵌套字段优化此Elasticsearch查询

Question

I am new to elasticserach. 我是Elasticserach的新手。 Below is the Sample data on which elastic query needs to run. 以下是需要在其上运行弹性查询的示例数据。 I am trying to get those docs in which account_type is "credit card" and source_name is 'SOMEVALUE' 我正在尝试获取account_type为“信用卡”且source_name为“ SOMEVALUE”的那些文档

{
"took" : 0,
"timed_out" : false,
"_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
},
"hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
        "_index" : "bureau_data",
        "_type" : "_doc",
        "_id" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
        "_score" : 1.0,
        "_source" : {
        "userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
        "raw_derived" : {
            "gender" : "MALE",
            "firstname" : "trsqlsz",
            "middlename" : "rgj",
            "lastname" : "ggksb",
            "mobilephone" : "2125954664",
            "dob" : "1988-06-28 00:00:00",
            "applications" : [
            {
                "applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
                "createdat" : "2019-06-07 19:28:54",
                "updatedat" : "2019-06-07 19:28:55",
                "source" : "4",
                "source_name" : "EXPERIAN",
                "applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
                "accounts" : [
                {
                    "applicationcreditreportaccountid" : "c5de28c4-cac9-4390-852a-96f143cb0b62",
                    "currentbalance" : 418288,
                    "institutionid" : "021d58b4-aba5-42c9-8d39-304a78d34aea",
                    "accounttypeid" : "5",
                    "institution_name" : "HDFC BANK",
                    "account_type_name" : "Personal Loan"
                }
                ]
            }
            ]
        }
        }
    }

I have tried the below query and its working fine. 我已经尝试了以下查询及其正常工作。 I need if we have any optimized way to query the multiple nested fields 我需要我们是否有任何优化的方法来查询多个嵌套字段

GET /my_index/_search
{
"query": {
    "bool": {
    "must": [
        {
        "nested": {
            "path": "raw_derived.applications.accounts",
            "query": {
            "bool": {
                "must": [
                {"match": {
                    "raw_derived.applications.accounts.account_type_name": "Credit Card"
                }}
                ]
            }
            }
        }
        },
        {
        "nested": {
            "path": "raw_derived.applications",
            "query": {
            "bool": {
                "must": [
                {"match": {
                    "raw_derived.applications.source_name": "CIBIL"
                }}
                ]
            }
            }
        }
        }
    ]
    }
}

}

If I will query on the multiple nested fields it will become very long Please suggest any other way to query nested fields or multiple AND 如果我要查询多个嵌套字段，它将变得很长。请建议使用任何其他方式查询嵌套字段或多个AND

Answer 1

Well your optimizations should always start with your data model / mapping since it's mostly the cause of performance issues and not your queries. 那么，您的优化应该始终从数据模型/映射开始，因为这主要是性能问题的原因，而不是查询的原因。

That being said, you can avoid the nested query by flattening your data. 话虽如此，您可以通过展平数据来避免嵌套查询。 A flattened data model would lead to one document per application and account element. 统一的数据模型将导致每个应用程序和帐户元素一个文档。

Since elasticsearch is a non-relational data store, it is completely fine to index "redundant" data. 由于Elasticsearch是非关系数据存储，因此对“冗余”数据进行索引完全可以。 This is not a lazy appraoch but a common way to handle these type of data structures. 这不是懒惰的方法，而是处理这些类型的数据结构的常用方法。

Sample document #1: 样本文档1：

{
    "_index" : "bureau_data",
    "_type" : "_doc",
    "_id" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
    "_score" : 1.0,
    "_source" : {
      "userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
      "gender" : "MALE",
      "firstname" : "trsqlsz",
      "middlename" : "rgj",
      "lastname" : "ggksb",
      "mobilephone" : "2125954664",
      "dob" : "1988-06-28 00:00:00",
      "applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
      "createdat" : "2019-06-07 19:28:54",
      "updatedat" : "2019-06-07 19:28:55",
      "source" : "4",
      "source_name" : "EXPERIAN",
      "applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
      "applicationcreditreportaccountid" : "c5de28c4-cac9-4390-852a-96f143cb0b62",
      "currentbalance" : 418288,
      "institutionid" : "021d58b4-aba5-42c9-8d39-304a78d34aea",
      "accounttypeid" : "5",
      "institution_name" : "HDFC BANK",
      "account_type_name" : "Personal Loan"
    }
}

If the same user creates another account you would send the very same ("redundant") data, except for that other account element/data like so: 如果同一用户创建另一个帐户，则您将发送完全相同（“冗余”）的数据，但其他帐户元素/数据除外，如下所示：

    {
    "_index" : "bureau_data",
    "_type" : "_doc",
    "_id" : "another, from es generated id",
    "_score" : 1.0,
    "_source" : {
      "userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
      "gender" : "MALE",
      "firstname" : "trsqlsz",
      "middlename" : "rgj",
      "lastname" : "ggksb",
      "mobilephone" : "2125954664",
      "dob" : "1988-06-28 00:00:00",
      "applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
      "createdat" : "2019-06-07 19:28:54",
      "updatedat" : "2019-06-07 19:28:55",
      "source" : "4",
      "source_name" : "EXPERIAN",
      "applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
      "applicationcreditreportaccountid" : "the new id",
      "currentbalance" : 4711,
      "institutionid" : "foo",
      "accounttypeid" : "bar",
      "institution_name" : "foo bar",
      "account_type_name" : "foo baz"
    }
}

With that kind of data model, you can run simple queries to get your results: 使用这种数据模型，您可以运行简单的查询来获取结果：

    GET /my_index/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "match":{
                    "account_type_name": "Credit Card"
                } 
            },
            {
                "match":{
                    "source_name": "CIBIL"
                } 
            }
            ]
        }
    }
}

还有其他方法可以针对JSON中的多个嵌套字段优化此Elasticsearch查询

问题描述

1 个解决方案

解决方案1
0 2019-08-11 15:29:44

还有其他方法可以针对JSON中的多个嵌套字段优化此Elasticsearch查询

问题描述

1 个解决方案

解决方案1 0 2019-08-11 15:29:44

解决方案1
0 2019-08-11 15:29:44