简体   繁体   English

Google Cloud Bigtable 与 Google Cloud Datastore

[英]Google Cloud Bigtable vs Google Cloud Datastore

What is the difference between Google Cloud Bigtable and Google Cloud Datastore / App Engine datastore, and what are the main practical advantages/disadvantages? Google Cloud Bigtable和 Google Cloud Datastore / App Engine datastore 有什么区别,主要的实际优势/劣势是什么? AFAIK Cloud Datastore is build on top of Bigtable. AFAIK Cloud Datastore 建立在 Bigtable 之上。

Based on experience with Datastore and reading the Bigtable docs , the main differences are:根据使用 Datastore 和阅读 Bigtable文档的经验,主要区别是:

  • Bigtable was originally designed for HBase compatibility, but now has client libraries in multiple languages . Bigtable 最初是为 HBase 兼容性而设计的,但现在有多种语言的客户端库 Datastore was originally more geared towards Python/Java/Go web app developers (originally App Engine) Datastore 最初更面向 Python/Java/Go 网络应用程序开发人员(最初是 App Engine)
  • Bigtable is 'a bit more IaaS' than Datastore in that it's not 'just there' but requires a cluster to be configured . Bigtable 比 Datastore 更“多一点 IaaS”,因为它不是“就在那里”,而是需要配置一个集群。
  • Bigtable supports only one index - the 'row key' (the entity key in Datastore) Bigtable 仅支持一个索引 - 'row key'(Datastore 中的实体键)
    • This means queries are on the Key, unlike Datastore's indexed properties这意味着查询在 Key 上,与 Datastore 的索引属性不同
  • Bigtable supports atomicity only on a single row - there are no transactions Bigtable 仅在单行上支持原子性 - 没有事务
  • Mutations and deletions appear not to be atomic in Bigtable, whereas Datastore provides eventual and strong consistency, depending on the read/query method Bigtable 中的突变和删除似乎不是原子的,而 Datastore 提供最终和强一致性,具体取决于读取/查询方法
  • The billing model is very different:计费模式非常不同:
    • Datastore charges for read/write operations, storage and bandwidth数据存储区的读/写操作、存储和带宽费用
    • Bigtable charges for 'nodes' , storage and bandwidth Bigtable 对“节点” 、存储和带宽收费

Bigtable is optimized for high volumes of data and analytics Bigtable 针对大量数据和分析进行了优化

  • Cloud Bigtable doesn't replicate data across zones or regions (data within a single cluster is replicated and durable), which means Bigtable is faster and more efficient, and costs are much lower, though it is less durable and available in the default configuration Cloud Bigtable不会跨地区或区域复制数据(单个集群内的数据是复制且持久的),这意味着 Bigtable 更快、更高效,成本也低得多,尽管它的持久性较差并且在默认配置中可用
  • It uses the HBase API - there's no risk of lock-in or new paradigms to learn它使用HBase API - 没有锁定或学习新范式的风险
  • It is integrated with the open-source Big Data tools, meaning you can analyze the data stored in Bigtable in most analytics tools customers use (Hadoop, Spark, etc.)它与开源大数据工具集成,这意味着您可以在客户使用的大多数分析工具(Hadoop、Spark 等)中分析存储在 Bigtable 中的数据。
  • Bigtable is indexed by a single Row Key Bigtable 由单个 Row Key 索引
  • Bigtable is in a single zone Bigtable 位于单个区域中

Cloud Bigtable is designed for larger companies and enterprises who often have larger data needs with complex backend workloads. Cloud Bigtable 专为大型公司和企业而设计,这些公司和企业通常有更大的数据需求和复杂的后端工作负载。

Datastore is optimized to serve high-value transactional data to applications数据存储区经过优化,可为应用程序提供高价值的交易数据

  • Cloud Datastore has extremely high availability with replication and data synchronization Cloud Datastore通过复制和数据同步具有极高的可用性
  • Datastore, because of its versatility and high availability, is more expensive数据存储由于其多功能性和高可用性而更加昂贵
  • Datastore is slower writing data due to synchronous replication由于同步复制,Datastore 写入数据的速度较慢
  • Datastore has much better functionality around transactions and queries (since secondary indexes exist)数据存储在事务和查询方面具有更好的功能(因为存在二级索引)

Bigtable and Datastore are extremely different. Bigtable 和 Datastore 非常不同。 Yes, the datastore is build on top of Bigtable, but that does not make it anything like it.是的,数据存储是建立在 Bigtable 之上的,但这并没有让它变得如此。 That is kind of like saying a car is build on top of wheels, and so a car is not much different from wheels.这有点像说汽车是建立在轮子上的,所以汽车与轮子没有太大区别。

Bigtable and Datastore provide very different data models and very different semantics in how the data is changed. Bigtable 和 Datastore 提供了非常不同的数据模型和数据更改方式的非常不同的语义。

The main difference is that the Datastore provides SQL-database-like ACID transactions on subsets of the data known as entity groups (though the query language GQL is much more restrictive than SQL).主要区别在于数据存储区在称为实体组的数据子集上提供类似于 SQL 数据库的 ACID 事务(尽管查询语言 GQL 比 SQL 限制性强得多)。 Bigtable is strictly NoSQL and comes with much weaker guarantees. Bigtable 是严格的 NoSQL 并且带有更弱的保证。

I am going to try to summarize all the answers above plus what is given in Coursea Google Cloud Platform Big Data and Machine Learning Fundamentals我将尝试总结上述所有答案以及 Coursea Google Cloud Platform Big Data and Machine Learning Fundamentals 中给出的内容

+---------------------+------------------------------------------------------------------+------------------------------------------+--+
|      Category       |                             BigTable                             |                Datastore                 |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+
| Technology          | Based on HBase(uses HBase API)                                   | Uses BigTable itself                     |  |
| ----------------    |                                                                  |                                          |  |
| Access Mataphor     | Key/Value (column-families) like Hbase                           | Persistent hashmap                       |  |
| ----------------    |                                                                  |                                          |  |
| Read                | Scan Rows                                                        | Filter Objects on property               |  |
| ----------------    |                                                                  |                                          |  |
| Write               | Put Row                                                          | Put Object                               |  |
| ----------------    |                                                                  |                                          |  |
| Update Granularity  | can't update row ( you should write a new row, can't update one) | can update attribute                     |  |
| ----------------    |                                                                  |                                          |  |
| Capacity            | Petabytes                                                        | Terbytes                                 |  |
| ----------------    |                                                                  |                                          |  |
| Index               | Index key only (you should properly design the key)              | You can index any property of the object |  |
| Usage and use cases | High throughput, scalable flatten data                           | Structured data for Google App Engine    |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+

Check this image too:也检查这张图片: 在此处输入图片说明

在此处输入图片说明

If you read papers, BigTable is this and Datastore is MegaStore .如果你读过论文,BigTable 就是这个,Datastore 就是MegaStore Datastore is BigTable plus replication, transaction, and index.数据存储是 BigTable 加上复制、事务和索引。 (and is much more expensive). (而且要贵得多)。

This might be another set of key differences between Google Cloud Bigtable and Google Cloud Datastore along with other services.这可能是 Google Cloud Bigtable 和 Google Cloud Datastore 以及其他服务之间的另一组主要区别。 The contents shown in the image below can also help you in selecting the right service.下图中显示的内容也可以帮助您选择正确的服务。

在此处输入图片说明

在此处输入图片说明

A relatively minor point to consider, as of November 2016, bigtable python client library is still in Alpha, which means the future change might not be backward compatible.需要考虑的相对较小的一点是,截至 2016 年 11 月,bigtable python 客户端仍处于 Alpha 阶段,这意味着未来的更改可能无法向后兼容。 Also, bigtable python library is not compatible with App Engine's standard environment.此外,bigtable python 库与 App Engine 的标准环境不兼容。 You have to use the flexible one.你必须使用灵活的。

Datastore is more application ready and suitable for a wide range of services, especially for microservices. Datastore 更适合应用程序,适用于广泛的服务,尤其是微服务。

The underlying technology of Datastore is Big Table, so you can imagine Big Table is more powerfuly. Datastore的底层技术是Big Table,可以想象Big Table的功能更强大。

Datastore come with 20K free operation per days, you can expect to host a server with reliable DB with ZERO cost. Datastore 每天免费运行 20K,您可以期望以零成本托管具有可靠数据库的服务器。

You can also check out this Datastore ORM library, it comes with a lot of great feature https://www.npmjs.com/package/ts-datastore-orm你也可以查看这个 Datastore ORM 库,它有很多很棒的功能https://www.npmjs.com/package/ts-datastore-orm

在此处输入图片说明

Cloud Datastore is a highly-scalable NoSQL database for your applications.
Like Cloud Bigtable, there is no need for you to provision database instances.
Cloud Datastore uses a distributed architecture to automatically manage
scaling. Your queries scale with the size of your result set, not the size of your
data set.
Cloud Datastore runs in Google data centers, which use redundancy to
minimize impact from points of failure. Your application can still use Cloud
Datastore when the service receives a planned upgrade.

在此处输入图片说明

 Choose Bigtable if the data is:
Big
● Large quantities (>1 TB) of semi-structured or structured data
Fast
● Data is high throughput or rapidly changing
NoSQL
● Transactions, strong relational semantics not required
And especially if it is:
Time series
● Data is time-series or has natural semantic ordering
Big data
● You run asynchronous batch or real-time processing on the data
Machine learning
● You run machine learning algorithms on the data
Bigtable is designed to handle massive workloads at consistent low latency
and high throughput, so it's a great choice for both operational and analytical
applications, including IoT, user analytics, and financial data analysis.

I just found this useful analogy buried in the length page about eventual consistency in the Datastore documentation (emphasis mine): 我刚刚在长度页面中找到了这个有用的类比, 关于数据存储文档中的最终一致性 (强调我的):

One practice is to combine Cloud Datastore and BigQuery to fulfill different business requirements. 一种做法是结合使用Cloud Datastore和BigQuery来满足不同的业务需求。 Use Cloud Datastore for online transactional processing (OLTP) required for core application logic and use BigQuery for online analytical processing (OLAP) for backend operations. 使用Cloud Datastore进行核心应用程序逻辑所需的联机事务处理(OLTP),并将BigQuery用于后端操作的联机分析处理(OLAP)。 It may be necessary to implement a continuous data export flow from Cloud Datastore to BigQuery to move the data necessary for those queries. 可能需要实施从Cloud Datastore到BigQuery的连续数据导出流,以移动这些查询所需的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM