简体   繁体   English

还有其他方法可以针对JSON中的多个嵌套字段优化此Elasticsearch查询

[英]Is there any another way to optimize this elasticsearch query for multiple nested fields in JSON

I am new to elasticserach. 我是Elasticserach的新手。 Below is the Sample data on which elastic query needs to run. 以下是需要在其上运行弹性查询的示例数据。 I am trying to get those docs in which account_type is "credit card" and source_name is 'SOMEVALUE' 我正在尝试获取account_type为“信用卡”且source_name为“ SOMEVALUE”的那些文档

{
"took" : 0,
"timed_out" : false,
"_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
},
"hits" : {
    "total" : {
    "value" : 1,
    "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
    {
        "_index" : "bureau_data",
        "_type" : "_doc",
        "_id" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
        "_score" : 1.0,
        "_source" : {
        "userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
        "raw_derived" : {
            "gender" : "MALE",
            "firstname" : "trsqlsz",
            "middlename" : "rgj",
            "lastname" : "ggksb",
            "mobilephone" : "2125954664",
            "dob" : "1988-06-28 00:00:00",
            "applications" : [
            {
                "applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
                "createdat" : "2019-06-07 19:28:54",
                "updatedat" : "2019-06-07 19:28:55",
                "source" : "4",
                "source_name" : "EXPERIAN",
                "applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
                "accounts" : [
                {
                    "applicationcreditreportaccountid" : "c5de28c4-cac9-4390-852a-96f143cb0b62",
                    "currentbalance" : 418288,
                    "institutionid" : "021d58b4-aba5-42c9-8d39-304a78d34aea",
                    "accounttypeid" : "5",
                    "institution_name" : "HDFC BANK",
                    "account_type_name" : "Personal Loan"
                }
                ]
            }
            ]
        }
        }
    }

I have tried the below query and its working fine. 我已经尝试了以下查询及其正常工作。 I need if we have any optimized way to query the multiple nested fields 我需要我们是否有任何优化的方法来查询多个嵌套字段

GET /my_index/_search
{
"query": {
    "bool": {
    "must": [
        {
        "nested": {
            "path": "raw_derived.applications.accounts",
            "query": {
            "bool": {
                "must": [
                {"match": {
                    "raw_derived.applications.accounts.account_type_name": "Credit Card"
                }}
                ]
            }
            }
        }
        },
        {
        "nested": {
            "path": "raw_derived.applications",
            "query": {
            "bool": {
                "must": [
                {"match": {
                    "raw_derived.applications.source_name": "CIBIL"
                }}
                ]
            }
            }
        }
        }
    ]
    }
}

}

If I will query on the multiple nested fields it will become very long Please suggest any other way to query nested fields or multiple AND 如果我要查询多个嵌套字段,它将变得很长。请建议使用任何其他方式查询嵌套字段或多个AND

Well your optimizations should always start with your data model / mapping since it's mostly the cause of performance issues and not your queries. 那么,您的优化应该始终从数据模型/映射开始,因为这主要是性能问题的原因,而不是查询的原因。

That being said, you can avoid the nested query by flattening your data. 话虽如此,您可以通过展平数据来避免嵌套查询。 A flattened data model would lead to one document per application and account element. 统一的数据模型将导致每个应用程序和帐户元素一个文档。

Since elasticsearch is a non-relational data store, it is completely fine to index "redundant" data. 由于Elasticsearch是非关系数据存储,因此对“冗余”数据进行索引完全可以。 This is not a lazy appraoch but a common way to handle these type of data structures. 不是懒惰的方法,而是处理这些类型的数据结构的常用方法。

Sample document #1: 样本文档1:

{
    "_index" : "bureau_data",
    "_type" : "_doc",
    "_id" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
    "_score" : 1.0,
    "_source" : {
      "userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
      "gender" : "MALE",
      "firstname" : "trsqlsz",
      "middlename" : "rgj",
      "lastname" : "ggksb",
      "mobilephone" : "2125954664",
      "dob" : "1988-06-28 00:00:00",
      "applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
      "createdat" : "2019-06-07 19:28:54",
      "updatedat" : "2019-06-07 19:28:55",
      "source" : "4",
      "source_name" : "EXPERIAN",
      "applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
      "applicationcreditreportaccountid" : "c5de28c4-cac9-4390-852a-96f143cb0b62",
      "currentbalance" : 418288,
      "institutionid" : "021d58b4-aba5-42c9-8d39-304a78d34aea",
      "accounttypeid" : "5",
      "institution_name" : "HDFC BANK",
      "account_type_name" : "Personal Loan"
    }
}

If the same user creates another account you would send the very same ("redundant") data, except for that other account element/data like so: 如果同一用户创建另一个帐户,则您将发送完全相同(“冗余”)的数据,但其他帐户元素/数据除外,如下所示:

    {
    "_index" : "bureau_data",
    "_type" : "_doc",
    "_id" : "another, from es generated id",
    "_score" : 1.0,
    "_source" : {
      "userid" : "bda57e01-c564-4cdc-bb8d-79bd2db9d2f8",
      "gender" : "MALE",
      "firstname" : "trsqlsz",
      "middlename" : "rgj",
      "lastname" : "ggksb",
      "mobilephone" : "2125954664",
      "dob" : "1988-06-28 00:00:00",
      "applicationid" : "c7fb0147-22fd-4a5e-8851-98241de6aa50",
      "createdat" : "2019-06-07 19:28:54",
      "updatedat" : "2019-06-07 19:28:55",
      "source" : "4",
      "source_name" : "EXPERIAN",
      "applicationcreditreportid" : "b67f9180-9bb6-485c-9cfc-e7ccf9a70a69",
      "applicationcreditreportaccountid" : "the new id",
      "currentbalance" : 4711,
      "institutionid" : "foo",
      "accounttypeid" : "bar",
      "institution_name" : "foo bar",
      "account_type_name" : "foo baz"
    }
}

With that kind of data model, you can run simple queries to get your results: 使用这种数据模型,您可以运行简单的查询来获取结果:

    GET /my_index/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "match":{
                    "account_type_name": "Credit Card"
                } 
            },
            {
                "match":{
                    "source_name": "CIBIL"
                } 
            }
            ]
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有什么方法可以将JSON查询转换为Elasticsearch Nest搜索查询吗? - Is there any way to convert JSON query to Elasticsearch Nest search query? 有什么方法可以优化 Snowflake 中的横向展平 json 查询? 我的查询执行时间过长 - Is there any way to optimize the lateral flatten json query in Snowflake? My query is taking too much time to execute 将任何 JSON、多次嵌套结构转换为 KEY 和 VALUE 字段 - Convert any JSON, multiple-times nested structure into the KEY and VALUE fields Elasticsearch查询性能(如果有多个字段还是只有一个字段)? - Elasticsearch query performance if there are multiple fields vs having a single field? 对多个字段进行弹性搜索 - elasticsearch upon multiple fields 使用 Elasticsearch 搜索多个字段 - Search on multiple fields with Elasticsearch Elasticsearch logstash 配置文件夹中的多个 json(嵌套)文件 - Elasticsearch logstash configuration for multiple json (nested) file in a folder 有没有办法查看查询是否与Elasticsearch中的数组中的任何元素匹配? - Is there a way to see if a query matches any element in an array in Elasticsearch? 如何从 json 字符串中查询多个字段? - How to query multiple fields from json string? 带有嵌套集的 Elasticsearch 查询 - Elasticsearch query with nested sets
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM