简体   繁体   中英

MongoDB Atlas Search index on normalized/indexed model

I'd like to use the fresh Atlas search index feature to perform search through my models.

It seems to me that the data model that I used can't be coupled with this mongo feature. It seems to work really fine on embedded models, but for consistency reasons I can't nest objects, they are referenced by their id.

Example

Collection Product

{
  name: "Foo product"
  quantity: 3
  tags: [
    "id_123"
  ]
}

Collection Vendor

{
  name: "Bar vendor"
  address: ...
  tags: [
    "id_123"
  ]
}

Collection Tags

{
  id: "id_123"
  name: "food"
}

What I want

I want to type food in my search bar, and find the products associated to the tag food.

Detailed problematic

I have multiple business objects that are labelled by the same tag. I'd like to build a search index to search through my products, but I would want to $lookup before to denormalize my ids and to be able to find all the products that have the tag "food".

From the documentation, the $search operator must be the first operator of the aggregation pipeline, preventing me from lookup before searching. I had the idea to build a view first, to unpack the id with the correct tag to prepare the field. But impossible to build a search index on a view.

Is it completely impossible to make this work? Do I need to give up on consistency on my tags by flattening and by embedding each of them directly in each model I need them to be able to use this feature? That means if I want to update a tag, I need to find every business object that carry around the tag, and perform the update?

I got in touch with the MongoDB support, and the Atlas Search proposed three ways to resolve this problem. I want to share the solutions with you if anybody steps on the same problem than I had to go through due to this model design.

Recommended: Transform the model in the Embedded way

The ideal MongoDB way of doing this would be to denormalize the model, and not using reference to various model. It has some drawbacks, like painful updates: each tags would be added directly in the model of Product, and Vendor, so there is no $lookup operations needed anymore. For my part, it is a no-go, the tags are planned to be updatable, and will be shared in almost every business objects I plan on making a model.

Collection Product

{
  name: "Foo product"
  quantity: 3
  tags: [
    "food"
  ]
}

Collection Vendor

{
  name: "Bar vendor"
  address: ...
  tags: [
    "food"
  ]
}

Not recommended but possible: Break the request in multiple parts

This would imply to keep the existing model, and to request the collections individually and resolving the sequential requests, application side.

We could put a Atlas Search index on Tags collection and use the research feature to find out the id of the tag we want. Then we could use this id to fetch directly in the Product/Vendor collection to find the product corresponding to the "food" tag. By tinkering the search application side, we could obtain satisfying results.

It is not the recommended way of doing it.

Theoretically my preferred way: Use the Materialized View feature

That is an intermediary solution, that will be the one I will try out. It is not perfect but for what I see, it tries to conciliated both of the capabilities of Referenced Model and Embedded model.

Atlas Search indexes are not usable on regular views. The workaround that can make this possible is Materialized view (which are more collection than view in definitive). This is made through the usage of the $merge operator which enables to save the results of ones aggregation pipeline in a collection. By re-running the pipeline, we can update the Materialized view. The trick is to make all required $lookup operations to denormalize the referenced model. Then use as final step the $merge operator to create the collection that supports the Atlas Search Index from scratch as any collection.

The only concern is the interval of update to choose for updating the Materialized view, that can be performance greedy. But on the paper, it is a really good solution for people like me that cannot (won't?) pay the price of painful updates strategy on Embedded models.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM