简体   繁体   English

DynamoDB - 避免数据覆盖,所有数据点的主分区键保持不变

[英]DynamoDB - avoid data overwrite with primary partition key remaining the same for all data points

I'm working on migrating the data from csv file stored in s3 to a table in DynamoDB.我正在将数据从 s3 中存储的 csv 文件迁移到 DynamoDB 中的表。 The code seems working but only the last data point is being posted on DynamoDB.代码似乎有效,但只有最后一个数据点发布在 DynamoDB 上。 The primary partition key (serial) is same for all data points.所有数据点的主分区键(串行)都相同。 Not sure if I'm doing something wrong here and any help is greatly appreciated.不确定我是否在这里做错了什么,非常感谢任何帮助。

import  boto3
s3_client = boto3.client("s3")

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('scan_records')

def lambda_handler(event, context):
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    s3_file_name = event['Records'][0]['s3']['object']['key']
    resp = s3_client.get_object(Bucket=bucket_name,Key=s3_file_name)
    data = resp['Body'].read().decode("utf-8")
    scan_time = data.split("\n")
    for scan in scan_time:
        print(scan)
        scan_data = scan.split(",")

    # Add it to dynamoDB

    try: 
         table.put_item(
         Item  = {
            'serial' : scan_data[0],
            'time' : scan_data[1],
        }
    )

    except Exception as e:
        print("End of File")

in your dynamoDb table your Primary key needs to be unique for each elements in the table.在您的 dynamoDb 表中,您的主键对于表中的每个元素都必须是唯一的。 So if the your primary key is only composed of a partition key that is the same for all your data point you will always have the same element overwritten.因此,如果您的主键仅由对所有数据点都相同的分区键组成,您将始终覆盖相同的元素。 * You could add to your table a sort key that uses another field so that the partition key, sort key pair composing the primary key is unique and hence appending data to your table. * 您可以将使用另一个字段的排序键添加到您的表中,以便组成主键的分区键、排序键对是唯一的,从而将数据附加到您的表中。 * If you can't have a unique primary key composed from your data points you can always add an UUID to the primary key to make it unique. * 如果您无法从数据点组成唯一的主键,您可以随时向主键添加 UUID 以使其唯一。

ConditionExpression='attribute_not_exists(serial) AND attribute_not_exists(time)',

Upon doing below two changes the issue was resolved and the code works fine.在进行以下两项更改后,问题得到解决,代码工作正常。 1. Unique entry checked with the combination of partition and sort key 2. Add loop to go line by line in the csv file and ingest the data into DynamoDB. 1. 使用分区和排序键的组合检查唯一条目 2. 在 csv 文件中逐行添加循环并将数据摄取到 DynamoDB。

Happy to share the code if anyone finds it useful.如果有人觉得它有用,很高兴分享代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在不使用主分区键的情况下查询 DynamoDB 中的所有数据 - Query all data in DynamoDB without using primary partition key Spark避免分区覆盖 - Spark avoid partition overwrite DynamoDB 复合主键在系统内移动的最佳实践(分区键和排序键) - Best practice for DynamoDB composite primary key travelling inside the system (partition key and sort key) 如何覆盖 SQL 服务器中具有相同主键的记录? - How to overwrite a record with same primary key in SQL Server? 如何在同一个主键下创建多个 DynamoDB 条目? - How to create multiple DynamoDB entries under the same primary key? 所有数据未存储在DynamoDB中 - All data not getting stored in DynamoDB Django。 为什么我的外键与父主键中的相同数据不匹配 - Django. Why would my foreign key does not match the same data from parent primary key Koalas applymap 将所有数据移动到单个分区 - Koalas applymap moving all data to a single partition 如果新行使用 python 具有相同的主键,如何覆盖 csv 中的一行 - How to overwrite a row in csv if new row has same primary key using python 在django模型中如何用`timestamp`作为主键覆盖`id`作为主键? - How to overwrite `id` as primary key with `timestamp` as primary key in django models?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM