简体   繁体   English

使用 Google Cloud Functions 从 Firestore 批量写入 ElasticSearch

[英]Bulk write to ElasticSearch from Firestore using Google Cloud Functions

Currently I am using Firestore for my database and I have a users collection.目前我正在为我的数据库使用 Firestore,并且我有一个用户集合。 Whenever a user document is created or updated to the users collection, a cloud function takes the user document and saves it in Elastisearch.每当用户文档被创建或更新到用户集合时,云函数都会获取用户文档并将其保存在 Elastisearch 中。

I am starting to be concerned about the scalability to this architecture.我开始担心这种架构的可扩展性。 For example, suppose that several thousand cloud functions started writing documents to Elasticsearch at once, is Elasticsearch going to handle this load.例如,假设数千个云函数同时开始向 Elasticsearch 写入文档,Elasticsearch 是否会处理这个负载。 Is there a better solution to this in Google cloud?谷歌云中是否有更好的解决方案?

For example, can those cloud functions write the user documents in a queue and have cloud functions at the other end of the queue take a 100 documents and bulk write them to Elasticsearch.例如,这些云函数是否可以将用户文档写入队列,并让队列另一端的云函数将 100 个文档批量写入 Elasticsearch。

I am new to Google cloud and would appreciate if you give me ideas, videos, and things to read.我是谷歌云的新手,如果你给我想法、视频和阅读的东西,我将不胜感激。

Thanks谢谢

ElasticSearch has no limits on number of documents it can have per index but there are some limits such as maximum doc size and bulk writes mentioned in their documentation . ElasticSearch 对每个索引可以拥有的文档数量没有限制,但在其文档中提到了一些限制,例如最大文档大小和批量写入。

Maximum Document Size: 100KB [configurable in 7.7+]最大文档大小:100KB [7.7+ 可配置]

Maximum Indexing Payload Size: 10MB最大索引负载大小:10MB

Bulk Indexing Maximum: 100 documents per batch批量索引最大值:每批 100 个文档

As far as I know, Google Cloud has no full text search API.据我所知,Google Cloud 没有全文搜索 API。

Talking of bulk writes, if realtime availability (data to be available immediately after adding) if not a concern, then you can store the new documents in Firestore along with a timestamp they were added and a boolean value if a document has been indexes in Elasticsearch.谈到批量写入,如果不担心实时可用性(添加后立即可用的数据),那么您可以在 Firestore 中存储新文档以及它们添加的时间戳和布尔值(如果文档已在 Elasticsearch 中建立索引) .

Then instead of running a cloud function with onCreate trigger, you can run a scheduled cloud function every N minutes which will:然后,您可以每 N 分钟运行一个预定的云函数,而不是使用onCreate触发器运行云函数,这将:

  1. Query documents which have not been added in Elasticsearch查询 Elasticsearch 中未添加的文档
  2. Make batches of 100 (for the 1000/batch limit)批量生产 100 个(针对 1000 个/批次限制)
  3. Upload them to Elasticsearch将它们上传到 Elasticsearch

This way you are are more documents per cloud function run so that'll be a bit efficient but if you need your new data to be available immediately then this won't work.通过这种方式,每个云功能运行时您会获得更多文档,因此效率会更高,但如果您需要立即提供新数据,那么这将不起作用。 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM