简体   繁体   中英

Why is my ElasticSeach query returning zero document?

I am trying to query an AWS ElasticSearch Domain from a Lambda worker.

To do so, I am using http-aws-es and the main javascript client for Elastic Search.

I query documents with the following relevant fields:

  • A ref field - String
  • A status field - String ENUM ( REMOVED , BLOCKED , PUBLISHED , PENDING , VERIFIED )
  • A field field - String Array
  • A thematics field - String Array

What I want to achieve is:

  1. Filter all documents that are not either PUBLISHED or VERIFIED or where the ref field is set
  2. Return the best matches with my keywwords argument (string array) relatively to values in field and thematics
  3. Sort to put documents with PUBLISHED status first
  4. Limit the number of results to 20

I found the more_like_this operator, and gave it a try. I build step by step my query and the actual version, at least, doesn't return an error, but no documents are returned. It still misses the ref filter + #3 and #4 from above. Here is the query:

  const client = new elasticsearch.Client({
      host: ELASTICSEARCH_DOMAIN,
      connectionClass: httpAwsEs,
      amazonES: {
        region: AWS_REGION,
        credentials: new AWS.EnvironmentCredentials('AWS')
      }
    })
    let keywords = event.arguments.keywords
    let rst = await client.search({
      body: {
        'query': {
          'bool': {
            'filter': {
              'bool': {
                'must_not': [
                  {
                    'term': {
                      'status': 'REMOVED'
                    }
                  },
                  {
                    'term': {
                      'status': 'PENDING'
                    }
                  },
                  {
                    'term': {
                      'status': 'BLOCKED'
                    }
                  }
                ]
              }
            },
            'must': {
              'more_like_this': {
                'fields': ['field', 'thematics'],
                'like': keywords,
                'min_term_freq': 1,
                'max_query_terms': 2
              },
              'should': [
                {
                  'term': {
                    'status': 'PUBLISHED'
                  }
                }
              ]
            }
          }
        }
      }

    })
    console.log(rst)
    return rst

I have to upload my lambda code to debug this and it complicates debugging a lot. Since I never made ES queries before, I wanted to have at least some hints as to how to proceed with this or know if I am misusing the ES query syntax.


EDIT:

As requested, here is my index mapping (with JS type):

  • city text (String)
  • contact_email text (String)
  • contact_entity text (String)
  • contact_firstname text (String)
  • contact_lastname text (String)
  • contacts text (String list)
  • country text (String)
  • createdAt date (String)
  • description text (String)
  • editKey text (String)
  • field text (String)
  • id text (String)
  • name text (String)
  • pubId text (String)
  • ref text (String)
  • state text (String)
  • status text (String)
  • thematics text (String Array)
  • type text (String Array)
  • updatedAt (String)
  • url text (String)
  • verifyKey text (String)
  • zone text (String Array)

Taken from AWS elastic search management console (index tabs > mappings)

There are one or two issues in your query ( should inside must and must_not inside filter ). Try the simplified query below instead:

{
  'query': {
    'bool': {
      'must_not': [
        {
          'term': {
            'status.keyword': 'REMOVED'
          }
        },
        {
          'term': {
            'status.keyword': 'PENDING'
          }
        },
        {
          'term': {
            'status.keyword': 'BLOCKED'
          }
        }
      ],
      'must': [
        {
          'more_like_this': {
            'fields': [
              'field',
              'thematics'
            ],
            'like': keywords,
            'min_term_freq': 1,
            'max_query_terms': 2
          }
        }
      ],
      'should': [
        {
          'term': {
            'status.keyword': 'PUBLISHED'
          }
        }
      ]
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM