简体   繁体   中英

Multi term nested document querying with ElasticSearch

I'm new to Elasticsearch, and I'm having trouble understanding why it does certain things. I have the following document structure indexed (I'm using Chewy in Rails, but it should make sense either way):

class OpportunityLocationsIndex < Chewy::Index
  define_type OpportunityLocation.includes(:opportunity).joins(:opportunity => :company).where(:opportunities => {is_valid: true}) do
    field :location
    field :coordinates, type: 'geo_point'
    field :opening_status

    field :opportunity, type: 'object' do
      field :name, :summary
      field :opportunity_count, value: ->(o) { o.total_positions }

      field :company, type: 'object' do
        field :name
        field :slug
        field :industry

        field :company_path, value: ->(c) { "/companies/" + c.slug }
        field :logo_image, value: ->(c) { c.logo_image.url(:medium) }
        field :logo_image_grey, value: ->(c) { c.logo_image.url(:greyscale) }
      end
    end
  end
end

Now, say I want to get all documents with location of "Johannesburg, Gauteng, South Africa", I would run the following query:

GET _search
{
    "query": {
        "match": {
           "location": "Johannesburg, Gauteng, South Africa"
        }
    }
}

Which would spit out the following.

  {
     "took": 7,
     "timed_out": false,
     "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
     },
     "hits": {
        "total": 13,
        "max_score": 1.6014341,
        "hits": [
           {
              "_index": "opportunity_locations",
              "_type": "opportunity_location",
              "_id": "56",
              "_score": 1.6014341,
              "_source": {
                 "location": "Johannesburg, Gauteng, South Africa",
                 "coordinates": "28.0473051, -26.2041028",
                 "opening_status": "closed",
                 "opportunity": {
                    "name": "Bentley Test Opportunity",
                    "summary": "Engineering at Bentley provides some unique and interesting challenges. The Interior Systems engineers...",
                    "opportunity_count": 6,
                    "company": {
                       "name": "Bentley Motors",
                       "slug": "bentley-motors",
                       "industry": "Automobile / Mechanical Engineering",
                       "company_path": "/companies/bentley-motors",
                       "logo_image": "/public/system/companies/logo_images/000/000/008/medium/bentley_logo_desktop_wallpaper-normal.jpg?1397906812",
                       "logo_image_grey": "/public/system/companies/logo_images/000/000/008/greyscale/bentley_logo_desktop_wallpaper-normal.jpg?1397906812"
                    }
                 }
              }
           },
           { etc. }
        ]
     }
  }

Right, so that works and makes sense that it works. Now, what if I want to get all documents that have the company name of "Bentley Motors" or "BMW", I try doing the following:

GET _search
{
    "query": {
        "terms": {
           "opportunity.company.name": [
              "Bentley Motors",
              "BMW"
           ]
        }
    }
}

Which returns zero results. What am I doing wrong?

It's related to how you index your data and then how you query it.

Your first request use a match query which is intelligent enough to determine if it must analyze or not your data, depending on how you've mapped your document type.

Your second request use a term query which doesn't use any analyzer, and search for the exact same value in the inverted index.

For example, if you index a string TEST , with default mapping :

  • a term query with TEST will output no result
  • a match query with TEST will return your document, because it will analyze the text the same way than at index time.

In your case, when you have indexed your document, this field value has been analyzed using the standard analyzer, which have transformed your value Bentley Motors into two separate terms bentley and motors .

You can check this by using only bentley or motors in your terms query : you will find your document.

Then try to change your second request to use a match query with Bentley Motors : your should retrieve your document too.

If you want to use a terms query for your second request, you must set the mapping of your company name field to not_analyzed .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM