[英]DynamoDB - avoid data overwrite with primary partition key remaining the same for all data points
I'm working on migrating the data from csv file stored in s3 to a table in DynamoDB.我正在将数据从 s3 中存储的 csv 文件迁移到 DynamoDB 中的表。 The code seems working but only the last data point is being posted on DynamoDB.代码似乎有效,但只有最后一个数据点发布在 DynamoDB 上。 The primary partition key (serial) is same for all data points.所有数据点的主分区键(串行)都相同。 Not sure if I'm doing something wrong here and any help is greatly appreciated.不确定我是否在这里做错了什么,非常感谢任何帮助。
import boto3
s3_client = boto3.client("s3")
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('scan_records')
def lambda_handler(event, context):
bucket_name = event['Records'][0]['s3']['bucket']['name']
s3_file_name = event['Records'][0]['s3']['object']['key']
resp = s3_client.get_object(Bucket=bucket_name,Key=s3_file_name)
data = resp['Body'].read().decode("utf-8")
scan_time = data.split("\n")
for scan in scan_time:
print(scan)
scan_data = scan.split(",")
# Add it to dynamoDB
try:
table.put_item(
Item = {
'serial' : scan_data[0],
'time' : scan_data[1],
}
)
except Exception as e:
print("End of File")
in your dynamoDb table your Primary key needs to be unique for each elements in the table.在您的 dynamoDb 表中,您的主键对于表中的每个元素都必须是唯一的。 So if the your primary key is only composed of a partition key that is the same for all your data point you will always have the same element overwritten.因此,如果您的主键仅由对所有数据点都相同的分区键组成,您将始终覆盖相同的元素。 * You could add to your table a sort key that uses another field so that the partition key, sort key pair composing the primary key is unique and hence appending data to your table. * 您可以将使用另一个字段的排序键添加到您的表中,以便组成主键的分区键、排序键对是唯一的,从而将数据附加到您的表中。 * If you can't have a unique primary key composed from your data points you can always add an UUID to the primary key to make it unique. * 如果您无法从数据点组成唯一的主键,您可以随时向主键添加 UUID 以使其唯一。
ConditionExpression='attribute_not_exists(serial) AND attribute_not_exists(time)',
Upon doing below two changes the issue was resolved and the code works fine.在进行以下两项更改后,问题得到解决,代码工作正常。 1. Unique entry checked with the combination of partition and sort key 2. Add loop to go line by line in the csv file and ingest the data into DynamoDB. 1. 使用分区和排序键的组合检查唯一条目 2. 在 csv 文件中逐行添加循环并将数据摄取到 DynamoDB。
Happy to share the code if anyone finds it useful.如果有人觉得它有用,很高兴分享代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.