简体   繁体   English

Google Cloud DataStore。 如何提供数据?

[英]Google Cloud DataStore. How to serve data?

Like many, I'm no new the NoSQL world. 与许多人一样,我不是NoSQL的新世界。 I did a lot of research, but I still lack only one point, which I can't find proper answer for. 我做了很多研究,但是我仍然只缺少一点,我找不到合适的答案。

Short description of system: 系统简短说明:

I'm building a system that collects Visitor's data on different websites. 我正在建立一个在不同网站上收集访客数据的系统。 Each visit is an Entity in the datastore, with properties like device type, IP, time of visit..etc. 每次访问都是数据存储区中的实体,具有设备类型,IP,访问时间等属性。

There will be millions of visits in the datastore. 数据存储中将有数百万的访问。

My Question, is how do I serve this data to clients. 我的问题是,如何将这些数据提供给客户。 My Data is setting in the datastore as "Visit" entities. 我的数据在数据存储区中设置为“访问”实体。

Now when a customer logs in, I don't want to show them millions of records. 现在,当客户登录时,我不想向他们显示数百万条记录。 I want for example to show them general stats. 例如,我想向他们显示常规统计信息。 Like number of visits on mobile device, number of visits from specific country in some time range, and stuff like that. 就像在移动设备上的访问次数,在某个时间范围内来自特定国家/地区的访问次数之类。

Now since I'm new to the NoSQL databases, I'm not sure how I should go around showing these stats in the clients' dashboard. 现在,由于我是NoSQL数据库的新手,所以我不确定如何在客户端的仪表板中显示这些统计信息。

As I know, Datastore has no support for aggregates, or getting count of query results for example. 据我所知,Datastore不支持聚合,例如不支持查询结果计数。

I looked at BigQuery, but BigQuery works on Datastore "backups", I need to serve data in real time, without needing to do backups manually. 我查看了BigQuery,但是BigQuery可以处理数据存储“备份”,我需要实时提供数据,而无需手动进行备份。

Also I read about counters, and sharding counters, is this the proper approach? 我还阅读了有关计数器和分片计数器的信息,这是正确的方法吗? have a counter for each client for each property for each tracking group? 每个跟踪组的每个属性的每个客户都有一个计数器? and show the total numbers this way? 并以这种方式显示总数? Sounds like too much for a simple purpose. 听起来太简单了。

Any input or explanation that can get me in the right direction would be highly appreciated. 任何能使我朝正确方向发展的建议或解释,将不胜感激。

Best Regards 最好的祝福

As I know, Datastore has no support for aggregates, or getting count of query results for example. 据我所知,Datastore不支持聚合,例如不支持查询结果计数。

This is not true. 这不是真的。 You can get a number of entities returned by a query with one line of code. 您可以使用一行代码获得查询返回的许多实体。 The query itself can be keys-only, which is very fast and basically free. 查询本身可以是仅键的,这非常快并且基本上是免费的。

Yes, counters are a good approach to your problem in terms of performance. 是的,就性能而言,计数器是解决您的问题的好方法。 They do have some downsides though, such as storage size and the fact that each time you would like to introduce a new type of statistic, you would need to create a counter for it. 但是它们确实有一些缺点,例如存储大小以及每次您想引入一种新的统计信息时都需要为其创建计数器的事实。

In addition to your current "Visit" entities, you could opt for storing the aggregated data in Sharded Counters in the Datastore. 除了当前的“访问”实体,您还可以选择将聚合数据存储在数据存储区的分片计数器中。 These counters can be updated in real-time, or via a Task in one of your task queues. 这些计数器可以实时更新,也可以通过一个任务队列中的任务进行更新。 It would be fairly straight-forward to create a Task that would create the various counters for the current Visit entities. 创建一个Task来为当前Visit实体创建各种计数器将非常简单。

Sharding is a way of creating multiple "underlying" entities that, when combined, represent some meaningful data. 分片是一种创建多个“基础”实体的方法,这些实体组合在一起时将代表一些有意义的数据。 Sharding is done to ensure that there are no performance issues due to concurrent updates. 进行分片以确保没有由于并发更新引起的性能问题。

From the Google Documentation: Google文档中:

If you had a single entity that was the counter and the update rate was too fast, then you would have contention as the serialized writes would stack up and start to timeout. 如果您只有一个实体作为计数器,并且更新速率太快,那么您将产生争执,因为序列化的写操作将堆积起来并开始超时。 The way to solve this problem is a little counter-intuitive if you are coming from a relational database; 如果您来自关系数据库,则解决此问题的方法有点违反直觉。 the solution relies on the fact that reads from the App Engine datastore are extremely fast and cheap. 该解决方案依赖于这样一个事实,即从App Engine数据存储区读取数据的速度非常快且便宜。 The way to reduce the contention is to build a sharded counter – break the counter up into N different counters. 减少争用的方法是建立一个分片计数器-将计数器分成N个不同的计数器。 When you want to increment the counter, you pick one of the shards at random and increment it. 当您想增加计数器时,可以随机选择其中一个碎片并对其进行递增。 When you want to know the total count, you read all of the counter shards and sum up their individual counts. 当您想知道总数时,您可以阅读所有计数器分片并汇总其各自的计数。 The more shards you have, the higher the throughput you will have for increments on your counter. 分片越多,计数器增加的吞吐量就越高。 This technique works for a lot more than just counters and an important skill to learn is spotting the entities in your application with a lot of writes and then finding good ways to shard them. 该技术的作用不仅限于计数器,还需要学习的一项重要技能是,通过大量编写来发现应用程序中的实体,然后找到分片的好方法。

I would recommend having a look at the link for further information and some helpful examples. 我建议查看链接以获取更多信息和一些有用的示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Google App Engine数据存储区中存储来自Android的用户特定数据。 祖先与否? - How to store user-specific data from Android in Google App Engine Datastore. Ancestor or not? 谷歌表示 JDO 不会从 Datastore 进行级联删除。 你是怎么做到的? - Google says JDO doesn't do cascading deletes from Datastore. So how do you do it? 检查谷歌应用引擎数据存储区中是否存在实体。 - Checking if Entity exists in google app engine datastore. 如何将数据从应用程序引擎保存到数据存储谷歌云 javascript - How to save data from app engine to datastore google cloud javascript 如何在Android设备和Google Cloud数据存储区之间同步和合并数据? - How to synchronize and merge data between android device and google cloud datastore? 从数据存储区将数据迁移到Google Cloud Firestore - Migrating data to Google Cloud Firestore from Datastore Python:在Google云数据存储模拟器中保存数据 - Python: Save data in google cloud datastore emulator Google Cloud Datastore如何在本地运行? - How does Google Cloud Datastore run locally? 如何使用php访问Google Cloud数据存储区? - How to access Google Cloud datastore with php? 如何删除Google Cloud数据存储区中的命名空间 - How to delete namespace in Google Cloud datastore
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM