简体   繁体   中英

Determining a partition key in Dynamo DB for GSI

I am new to DynamoDB and I am finding it hard to think of how I should decide my partition key. I am using a condensed version of my use case:

I have an attribute which is a boolean value => B For a given ID, I need to return all the data for it. The ID is either X or Y attribute. For the given ID, if B is true, I need to read attribute X, else Y.

While inserting into the table I know the the value of B and hence I can fill it in either X or Y depending on the value of it.

However while fetching, I just am given an ID, and I need to figure out whether it exists in column X or column Y ( I won't be getting the value of B in the input).

In a RDBMS I could run a query like select * from tab where (B == true && X == ID) || (B==false && Y == ID) select * from tab where (B == true && X == ID) || (B==false && Y == ID) .

I think creating a GSI in DynamoDB will be the way to go about solving this in Dynamo. However I am not able to figure out the best way to approach this. Could I get suggestions?

Not sure if I got your use case correctly, but why not just swapping target columns based on value B while inserting a row.

Consider the following input:

+-----+------+--------+
|  X  |  Y   |   B    |
+-----+------+--------+
| ID1 | ID2  |  true  |
+-----+------+--------+
| ID3 | ID4  |  true  |
+-----+------+--------+
| ID5 | ID6  |  false |
+-----+------+--------+
| ID7 | ID8  |  false |
+-----+------+--------+

What if you store the values like this:

+-----------+-------------------------+
|  id       |      opposite id        |   
|(hash key) | or whatever you call it | 
+-----------+-------------------------+
| ID1       |        ID2              | 
+-----------+-------------------------+
| ID3       |        ID4              | 
+-----------+-------------------------+
| ID6       |        ID5              | 
+-----------+-------------------------+
| ID8       |        ID7              | 
+-----------+-------------------------+

This way, while fetching an item by an IDXXX value you would need to perform a query on the single column id .


UPD: Notice, if your use case allows having multiple records with a same id, you would need an another field to serve as a range key . This holds true no matter whether you swap columns like shown above or not.

As Per AWS DynamoDB Blog Post : Choosing the Right DynamoDB Partition Key

Choosing the Right DynamoDB Partition Key is an important step in the design and building of scalable and reliable applications on top of DynamoDB.

What is a partition key?

DynamoDB supports two types of primary keys:

Partition key : Also known as a hash key, the partition key is composed of a single attribute. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.

Partition key and sort key : Referred to as a composite primary key or hash-range key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. Here is an example:

在此处输入图片说明

Why do I need a partition key?

DynamoDB stores data as groups of attributes, known as items. Items are similar to rows or records in other database systems. DynamoDB stores and retrieves each item based on the primary key value which must be unique. Items are distributed across 10 GB storage units, called partitions (physical storage internal to DynamoDB). Each table has one or more partitions, as shown in Figure 2. For more information, see the Understand Partition Behavior in the DynamoDB Developer Guide.

DynamoDB uses the partition key's value as an input to an internal hash function. The output from the hash function determines the partition in which the item will be stored. Each item's location is determined by the hash value of its partition key.

All items with the same partition key are stored together, and for composite partition keys, are ordered by the sort key value. DynamoDB will split partitions by sort key if the collection size grows bigger than 10 GB.

在此处输入图片说明

Recommendations for partition keys

Use high-cardinality attributes. These are attributes that have distinct values for each item like e-mail id, employee_no, customerid, sessionid, ordered, and so on.

Use composite attributes. Try to combine more than one attribute to form a unique key, if that meets your access pattern. For example, consider an orders table with customerid+productid+countrycode as the partition key and order_date as the sort key.

Cache the popular items when there is a high volume of read traffic. The cache acts as a low-pass filter, preventing reads of unusually popular items from swamping partitions. For example, consider a table that has deals information for products. Some deals are expected to be more popular than others during major sale events like Black Friday or Cyber Monday.

Add random numbers/digits from a predetermined range for write-heavy use cases. If you expect a large volume of writes for a partition key, use an additional prefix or suffix (a fixed number from predeternmined range, say 1-10) and add it to the partition key. For example, consider a table of invoice transactions. A single invoice can contain thousands of transactions per client.

Read More @ Choosing the Right DynamoDB Partition Key

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM