简体繁体 English

使用适用于Titan的DynamoDB存储后端的DynamoDB定价

[英]DynamoDB pricing using DynamoDB Storage Backend for Titan

原文 2016-01-14 14:06:00 9 1 amazon-dynamodb/ titan

I would like to get a good understanding of what would be the price (in terms of $) of using DynamoDB Titan backend. 我想很好地了解使用DynamoDB Titan后端的价格（以$计）。 For this, I need to be able to understand when DynamoDB Titan backend does reads and writes. 为此，我需要能够理解DynamoDB Titan后端何时进行读写操作。 Right now I am pretty clueless. 现在我很无能为力。

Ideally I would like to run a testcase which adds some vertices, edges and then does a rather simple traversal and then see how many reads and writes were done. 理想情况下，我想运行一个测试用例，它添加一些顶点，边，然后进行相当简单的遍历，然后查看完成了多少次读写操作。 Any ideas of how I can achieve this? 有关如何实现这一目标的任何想法？ Possibly through metrics? 可能通过指标？

If it turns out I can't extract this information myself, I would very much appreciate a first brief explanation about when DynamoDB Titan backend performs reads and writes. 如果事实证明我无法自己提取这些信息，我将非常感谢有关DynamoDB Titan后端何时执行读写操作的第一个简要说明。

1 个解决方案

For all Titan backends, to understand and estimate the number of writes, we rely on estimating the number of columns for a given KCVStore. 对于所有Titan后端，要理解和估计写入次数，我们依赖于估计给定KCVStore的列数。 You can also measure the number of columns that get written using metrics when using the DynamoDB Storage Backend for Titan. 您还可以使用DynamoDB Storage Backend for Titan测量使用指标编写的列数。

To enable metrics, enable the configuration options listed here . 要启用指标，请启用此处列出的配置选项。 Specifically, enable lines 7-11. 具体而言，启用第7-11行。 Note the max-queue-length configuration property. 请注意max-queue-length配置属性。 If the executor-queue-size metric hits max-queue-length for a particular tx.commit() call, then you know that the queue / storage.buffer-size were not large enough. 如果executor-queue-size指标达到特定tx.commit()调用的max-queue-length，那么您就知道queue / storage.buffer-size不够大。 Once the executor-queue-size metric peaks without reaching max-queue-length, you know you have captured all the columns being written in a tx.commit() call, so that will give you the number of columns being changed in a tx.commit() . 一旦执行程序 - 队列大小度量达到峰值而没有达到max-queue-length，您就知道已经捕获了在tx.commit()调用中写入的所有列，因此这将为您提供在tx.commit()中更改的列数tx.commit() 。 You can look at UpdateItem metrics for edgestore and graphindex to understand the spread of columns between the two tables. 您可以查看edgestore和graphindex的UpdateItem指标，以了解两个表之间的列的扩展。

All Titan storage backends implement KCVStore, and the keys and columns have different meanings depending on the kind of store. 所有Titan存储后端都实现了KCVStore，键和列具有不同的含义，具体取决于商店的类型。 There are two stores that get the bulk of writes, assuming you have not turned on user-defined transaction logs. 假设您尚未打开用户定义的事务日志，则有两个存储可以获得大量写入。 They are edgestore and graphindex. 它们是edgestore和graphindex。

The edgestore KCVStore is always written to, regardless of whether you configure composite indexes. 无论您是否配置复合索引，始终会写入edgestore KCVStore。 Each edge and all of the edge properties of that edge are represented by two columns (unless you set the schema of that edge label to be unidirectional). 边的每个边和所有边属性由两列表示（除非您将该边标签的模式设置为单向）。 The key of edge columns are the out-vertex of an edge in the direct column, and the in-vertex of an edge in the reverse. 边列的关键是直接列中边的外顶点，反之是边的顶点。 Again, the column of an edge is the in-vertex of an edge in the direct column, and the out-vertex of an edge in the reverse. 同样，边的列是直接列中边的顶点，反之是边的外顶点。 Each vertex is represented by at least one column for the VertexExists hidden property, one column for a vertex label (optional) and one column for each vertex property. 每个顶点由VertexExists隐藏属性的至少一列表示，顶点标签的一列（可选）和每个顶点属性的一列。 The key of vertices is the vertex id and the columns correspond to vertex properties, hidden vertex properties, and labels. 顶点的关键是顶点id，列对应于顶点属性，隐藏顶点属性和标签。

The graphindex KCVStore will only be written to if you configure composite indexes in the Titan management system. 只有在Titan管理系统中配置复合索引时，才会写入graphindex KCVStore。 You can index vertex and edge properties. 您可以索引顶点和边缘属性。 For each pair of indexed value and edge/vertex that has that indexed value, there will be one column in the graphindex KCVStore. 对于具有该索引值的每对索引值和边/顶点，graphindex KCVStore中将有一列。 The key will be a combination of the index id and value, and the column will be the vertex/edge id. 键将是索引标识和值的组合，列将是顶点/边标识。

Now that you know how to count columns, you can use this knowledge to estimate the size and number of writes to edgestore and graphindex when using the DynamoDB Storage Backend for Titan. 现在您已了解如何计算列，您可以使用此知识来估计在使用DynamoDB Storage Backend for Titan时对edgestore和graphindex的写入大小和数量。 If you use the multiple-item data model for a KCVStore, you will get one item for each key-column pair. 如果您对KCVStore使用多项数据模型，则每个键 - 列对将获得一个项目。 If you use the single-item data model for a KCVStore, you will get one item for all columns at a key (this is not necessarily true when graph partitioning is enabled but this is a detail I will not discuss now). 如果您使用KCVStore的单项数据模型，您将获得一个键的所有列的一个项目（当启用图形分区时，这不一定是真的，但这是我现在不讨论的细节）。 As long as each vertex property is less than 1kb, and the sum of all edge properties for an edge are less than 1 kb, each column will cost 1 WCU to write when using multiple-item data model for edgestore. 只要每个顶点属性小于1kb，并且边的所有边属性的总和小于1 kb，当使用edgestore的多项数据模型时，每列将花费1 WCU来写入。 Again, each column in the graphindex will cost 1 WCU to write if you use the multiple-item data model. 同样，如果使用多项数据模型，graphindex中的每列将花费1 WCU来编写。

Lets assume you did your estimation and you use multiple-item data model throughout. 让我们假设您做了估算，并且您始终使用多项数据模型。 Lets assume you estimate that you will be writing 750 columns per second to edgestore and 750 columns per second to graphindex, and that you want to drive this load for a day. 让我们假设您估计每秒会向edgestore写入750列，每秒写入750列到graphindex，并且您希望将此负载驱动一天。 You can set the read capacity for both tables to 1, so you know each table will start off with one physical DynamoDB partition to begin with. 您可以将两个表的读取容量设置为1，因此您知道每个表都将从一个物理DynamoDB分区开始。 In us-east-1, the cost for writes is $0.0065 per hour for every 10 units of write capacity, so 24 * 75 * $0.0065 is $11.70 per day for writes for each table. 在us-east-1中，每10个写入容量的写入成本为每小时0.0065美元，因此每个表的写入每天24 * 75 * $ 0.0065为11.70美元。 This means the write capacity would cost $23.40 per day for edgestore and graphindex together. 这意味着edgestore和graphindex的写入容量每天将花费23.40美元。 The reads could be set to 1 read per second for each of the tables, making the read cost 2 * 24 * $0.0065 = $0.312 for both tables per day. 对于每个表，读数可以设置为每秒读取1次，使得每天两个表的读取成本为2 * 24 * $ 0.0065 = $ 0.312。 If your AWS account is new, the reads would fall within the free tier, so effectively, you would only be paying for the writes. 如果您的AWS账户是新账户，那么读取将属于免费套餐，因此，您只需支付写入费用。

Another aspect of DynamoDB pricing is storage. DynamoDB定价的另一个方面是存储。 If you write 750 columns per second, that is 64.8 million items per day to one table, that means 1.9 billion (approximately 2 billion) items per month. 如果您每秒写入750列，即每天6480万个项目，这意味着每月有19亿（约20亿）项目。 The average number of items in the table in a month is then 1 billion. 表中一个月的平均项目数是10亿。 If each items averages out to 412 bytes, and there is 100 bytes of overhead, then that means 1 billion 512 byte items are stored for a month, approximately 477 GB in a month. 如果每个项目的平均值为412个字节，并且有100个字节的开销，那么这意味着一个月存储10亿个512字节的项目，一个月大约为477 GB。 477 / 25 rounded up is 20, so storage for the first month at this load would cost 20 * $0.25 dollars a month. 477/25四舍五入是20，因此在这个负载下第一个月的存储费用为每月20 * 0.25美元。 If you keep adding items at this rate without deleting them, the monthly storage cost will increase by approximately 5 dollars per month. 如果您继续以此速率添加商品而不删除它们，则每月存储成本将每月增加约5美元。

If you do not have super nodes in your graph, or vertices with a relatively large number of properties, then the writes to the edgestore will be distributed evenly throughout the partition key space. 如果图形中没有超级节点，或者具有相对大量属性的顶点，则对edgestore的写入将在整个分区键空间中均匀分布。 That means your table will split into 2 partitions when it hits 10GB, and then each of those partitions will split into a total of 4 partitions when they hit 10GB, and so on and so forth. 这意味着当你的表达到10GB时，你的表将分成2个分区，然后当它们达到10GB时，每个分区将分成总共4个分区，依此类推。 the nearest power of 2 to 477 GB / (10 GB / partition) is 2^6=64, so that means your edgestore would split 6 times over the course of the first month. 最接近的2到477 GB /（10 GB /分区）的功率是2 ^ 6 = 64，这意味着你的边缘存储将在第一个月的过程中分裂6次。 You would probably have around 64 partitions at the end of the first month. 在第一个月末，您可能会有大约64个分区。 Eventually, your table will have so many partitions that each partition will have very few IOPS. 最终，您的表将拥有如此多的分区，每个分区的IOPS都很少。 This phenomenon is called IOPS starvation. 这种现象称为IOPS饥饿。 You should have a strategy in place to address IOPS starvation. 你应该有一个策略来解决IOPS饥饿问题。 Two commonly used strategies are 1. batch cleanup/archival of old data and 2. rolling (time-series) graphs. 两种常用的策略是1.批量清理/旧数据存档和2.滚动（时间序列）图。 In option 1, you spin up an EC2 instance to traverse the graph and write old data to a colder store (S3, Glacier etc) and delete it from DynamoDB. 在选项1中，您启动EC2实例以遍历图形并将旧数据写入较冷的存储（S3，Glacier等）并从DynamoDB中删除它。 In option 2, you direct writes to graphs that correspond to a time period (weeks - 2015W1, months - 2015M1, etc). 在选项2中，您可以直接写入与时间段（周 - 2015W1，月 - 2015M1等）对应的图形。 As time passes, you down provision the writes on the older tables, and when time comes to migrate them to colder storage, you read the entire graph for that time period and delete the corresponding DynamoDB tables. 随着时间的推移，您需要在较旧的表上设置写入，并且在将它们迁移到较冷的存储时，您将读取该时间段的整个图并删除相应的DynamoDB表。 The advantage of this approach is that it allows you to manage your write provisioning cost with higher granularity, and it allows you to avoid the cost of deleting individual items (because you delete a table for free instead of incurring at least 1 WCU for every item you delete). 这种方法的优点是它允许您以更高的粒度管理您的写入供应成本，并且它允许您避免删除单个项目的成本（因为您免费删除表格而不是为每个项目产生至少1个WCU你删除）。