简体   繁体   English

Firebase:对大型数据集的查询

[英]Firebase: queries on large datasets

I'm using Firebase to store user profiles.我正在使用 Firebase 来存储用户配置文件。 I tried to put the minimum amount of data in each user profile (following the good practices advised in the documentation about structuring data) but as I have more than 220K user profiles, it still represents 150MB when downloading as JSON all user profiles.我试图在每个用户配置文件中放入最少的数据(遵循有关结构化数据的文档中建议的良好实践),但由于我有超过 220K 的用户配置文件,当以 JSON 格式下载所有用户配置文件时,它仍然代表 150MB。 And of course, it will grow bigger and bigger as I intend to have a lot more users :)当然,它会越来越大,因为我打算拥有更多用户:)

I can't do queries on those user profiles anymore because each time I do that, I reach 100% Database I/O capacity and thus some other requests, performed by users currently using the app, end up with errors.我无法再对这些用户配置文件进行查询,因为每次这样做时,我都会达到 100% 的数据库 I/O 容量,因此当前使用该应用程序的用户执行的一些其他请求最终会出现错误。

I understand that when using queries, Firebase need to consider all data in the list and thus read it all from disk.我知道在使用查询时,Firebase 需要考虑列表中的所有数据,从而从磁盘中读取所有数据。 And 150MB of data seems to be too much.而且 150MB 的数据似乎太多了。

So is there an actual limit before reaching 100% Database I/O capacity?那么在达到 100% 的数据库 I/O 容量之前是否存在实际限制? And what is exactly the usefulness of Firebase queries in that case?在这种情况下,Firebase 查询究竟有什么用处? If I simply have small amounts of data, I don't really need queries, I could easily download all data.如果我只有少量数据,我真的不需要查询,我可以轻松下载所有数据。 But now that I have a lot of data, I can't use queries anymore, when I need them the most...但是现在我有很多数据,我不能再使用查询了,当我最需要它们的时候......

The core problem here isn't the query or the size of the data, it's simply the time required to warm the data into memory (ie load it from disk) when it's not being frequently queried.这里的核心问题不是查询或数据的大小,它只是在不经常查询数据时将数据预热到内存(即从磁盘加载)所需的时间。 It's likely to be only a development issue, as in production this query would likely be a more frequently used asset.这可能只是一个开发问题,因为在生产中,此查询可能是更常用的资产。

But if the goal is to improve performance on initial load, the only reasonable answer here is to query on less data.但是如果目标是提高初始加载的性能,这里唯一合理的答案是查询更少的数据。 150MB is significant. 150MB 很重要。 Try copying a 150MB file between computers over a wireless network and you'll have some idea what it's like to send it over the internet, or to load it into memory from a file server.尝试通过无线网络在计算机之间复制一个 150MB 的文件,您将了解通过 Internet 发送它或从文件服务器将其加载到内存中的感觉。

A lot here depends on the use case, which you haven't included.这在很大程度上取决于用例,您尚未包括在内。

Assuming you have fairly standard search criteria (eg you search on email addresses), you can use indices to store email addresses separately to reduce the data set for your query.假设您有相当标准的搜索条件(例如您搜索电子邮件地址),您可以使用索引来单独存储电子邮件地址以减少查询的数据集。

/search_by_email/$user_id/<email address>

Now, rather than 50k per record, you have only the bytes to store the email address per records--a much smaller payload to warm into memory.现在,不是每条记录 50k,您只有用于存储每条记录的电子邮件地址的字节——一个更小的有效负载,可以预热到内存中。

Assuming you're looking for robust search capabilities, the best answer is to use a real search engine.假设您正在寻找强大的搜索功能,最好的答案是使用真正的搜索引擎。 For example, enable private backups and export to BigQuery, or go with ElasticSearch (see Flashlight for an example).例如,启用私有备份并导出到 BigQuery,或使用 ElasticSearch(请参阅Flashlight示例)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM