简体   繁体   English

为什么我的 ElasticSeach 查询返回零文档?

[英]Why is my ElasticSeach query returning zero document?

I am trying to query an AWS ElasticSearch Domain from a Lambda worker.我正在尝试从 Lambda 工作人员查询 AWS ElasticSearch 域。

To do so, I am using http-aws-es and the main javascript client for Elastic Search.为此,我使用http-aws-es和主要的 javascript 客户端进行弹性搜索。

I query documents with the following relevant fields:我查询具有以下相关字段的文档:

  • A ref field - String一个ref字段 - 字符串
  • A status field - String ENUM ( REMOVED , BLOCKED , PUBLISHED , PENDING , VERIFIED ) status字段 - 字符串枚举( REMOVEDBLOCKEDPUBLISHEDPENDINGVERIFIED
  • A field field - String Array一个field字段 - 字符串数组
  • A thematics field - String Array一个thematics领域——字符串数组

What I want to achieve is:我想要实现的是:

  1. Filter all documents that are not either PUBLISHED or VERIFIED or where the ref field is set过滤所有不是PUBLISHEDVERIFIED或设置了ref字段的文档
  2. Return the best matches with my keywwords argument (string array) relatively to values in field and thematics返回与我的keywwords参数(字符串数组)相对于fieldthematics中的值的最佳匹配
  3. Sort to put documents with PUBLISHED status first排序以首先放置具有PUBLISHED状态的文档
  4. Limit the number of results to 20将结果数限制为 20

I found the more_like_this operator, and gave it a try.我找到了more_like_this运算符,并试了一下。 I build step by step my query and the actual version, at least, doesn't return an error, but no documents are returned.我一步一步地构建我的查询,至少实际版本不会返回错误,但不会返回任何文档。 It still misses the ref filter + #3 and #4 from above.它仍然错过了上面的ref过滤器 + #3 和 #4。 Here is the query:这是查询:

  const client = new elasticsearch.Client({
      host: ELASTICSEARCH_DOMAIN,
      connectionClass: httpAwsEs,
      amazonES: {
        region: AWS_REGION,
        credentials: new AWS.EnvironmentCredentials('AWS')
      }
    })
    let keywords = event.arguments.keywords
    let rst = await client.search({
      body: {
        'query': {
          'bool': {
            'filter': {
              'bool': {
                'must_not': [
                  {
                    'term': {
                      'status': 'REMOVED'
                    }
                  },
                  {
                    'term': {
                      'status': 'PENDING'
                    }
                  },
                  {
                    'term': {
                      'status': 'BLOCKED'
                    }
                  }
                ]
              }
            },
            'must': {
              'more_like_this': {
                'fields': ['field', 'thematics'],
                'like': keywords,
                'min_term_freq': 1,
                'max_query_terms': 2
              },
              'should': [
                {
                  'term': {
                    'status': 'PUBLISHED'
                  }
                }
              ]
            }
          }
        }
      }

    })
    console.log(rst)
    return rst

I have to upload my lambda code to debug this and it complicates debugging a lot.我必须上传我的 lambda 代码来调试它,它使调试变得很复杂。 Since I never made ES queries before, I wanted to have at least some hints as to how to proceed with this or know if I am misusing the ES query syntax.由于我以前从未进行过 ES 查询,因此我想至少获得一些关于如何进行此操作的提示,或者知道我是否在滥用 ES 查询语法。


EDIT:编辑:

As requested, here is my index mapping (with JS type):根据要求,这是我的索引映射(带有 JS 类型):

  • city text (String)城市文本(字符串)
  • contact_email text (String) contact_email 文本(字符串)
  • contact_entity text (String)联系人实体文本(字符串)
  • contact_firstname text (String)联系人名字文本(字符串)
  • contact_lastname text (String)联系人姓氏文本(字符串)
  • contacts text (String list)联系人文本(字符串列表)
  • country text (String)国家文本(字符串)
  • createdAt date (String) createdAt 日期(字符串)
  • description text (String)描述文本(字符串)
  • editKey text (String)编辑键文本(字符串)
  • field text (String)字段文本(字符串)
  • id text (String)标识文本(字符串)
  • name text (String)名称文本(字符串)
  • pubId text (String) pubId 文本(字符串)
  • ref text (String)参考文本(字符串)
  • state text (String) state 文本(字符串)
  • status text (String)状态文本(字符串)
  • thematics text (String Array)主题文本(字符串数组)
  • type text (String Array)输入文本(字符串数组)
  • updatedAt (String)更新时间(字符串)
  • url text (String) url 文本(字符串)
  • verifyKey text (String)验证密钥文本(字符串)
  • zone text (String Array)区域文本(字符串数组)

Taken from AWS elastic search management console (index tabs > mappings)取自 AWS 弹性搜索管理控制台(索引选项卡 > 映射)

There are one or two issues in your query ( should inside must and must_not inside filter ).您的查询中存在一两个问题( shouldmustmust_notfilter内)。 Try the simplified query below instead:请尝试下面的简化查询:

{
  'query': {
    'bool': {
      'must_not': [
        {
          'term': {
            'status.keyword': 'REMOVED'
          }
        },
        {
          'term': {
            'status.keyword': 'PENDING'
          }
        },
        {
          'term': {
            'status.keyword': 'BLOCKED'
          }
        }
      ],
      'must': [
        {
          'more_like_this': {
            'fields': [
              'field',
              'thematics'
            ],
            'like': keywords,
            'min_term_freq': 1,
            'max_query_terms': 2
          }
        }
      ],
      'should': [
        {
          'term': {
            'status.keyword': 'PUBLISHED'
          }
        }
      ]
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM