简体   繁体   English

DynamoDB 扫描返回多个扫描结果

[英]DynamoDB scan returns multiple scan results

So I've written the below function.所以我写了下面的函数。 This version is a bit abridged and I've anonymized the data but the critical components are there.这个版本有点删节,我对数据进行了匿名处理,但关键组件还在那里。

The function basically takes in a list of parameters from an API-Gateway call, queries a db for each of them then returns the results.该函数基本上从 API 网关调用中获取参数列表,为每个参数查询一个数据库,然后返回结果。

I'm finding that the scan runs perfectly with one parameter, but returns duplicate data when more than 1 are called .我发现扫描使用一个参数完美运行,但是当调用 1 个以上时返回重复数据。 From the logs I can see that the scans are running multiple times when multiple params are passed从日志中我可以看到,当传递多个参数时,扫描会多次运行

For example, with one param the function logs return例如,使用一个参数,函数日志返回

2020-03-19 20:27:42.974 Starting the 0 scan with 3 as the id 
2020-03-19 20:27:43.047 The 0 scan has completed successfully

With two params the logs are有两个参数,日志是

2020-03-19 20:28:42.189 Starting the 0 scan with 2 as the id
2020-03-19 20:28:42.261 The 0 scan has completed successfully
2020-03-19 20:28:42.262 Starting the 1 scan with 3 as the id
2020-03-19 20:28:42.267 The 0 scan has completed successfully
2020-03-19 20:28:42.293 The 1 scan has completed successfully

And with 3 params the logs are并且有 3 个参数,日志是

2020-03-19 20:29:49.209 Starting the 0 scan with 1 as the id
2020-03-19 20:29:49.323 The 0 scan has completed successfully
2020-03-19 20:29:49.325 Starting the 1 scan with 2 as the id
2020-03-19 20:29:49.329 The 0 scan has completed successfully
2020-03-19 20:29:49.380 The 1 scan has completed successfully
2020-03-19 20:29:49.381 Starting the 2 scan with 3 as the id
2020-03-19 20:29:49.385 The 1 scan has completed successfully
2020-03-19 20:29:49.437 The 2 scan has completed successfully

Here is the code that runs the for loop and the scan.这是运行 for 循环和扫描的代码。 I've hardcoded the parameters and excluded some non-pertinent stuff我对参数进行了硬编码并排除了一些不相关的内容

     const params = ['1','2','3'];
     for (let i = 0; i < params.length; i++) {
      console.log("Starting the " + i + " scan with " + params[i] + " as the scan parameter")
      const scanParams = {
      TableName: "Dynamo_Table",
      FilterExpression: "Org = :Org",
      ExpressionAttributeValues: { ":Org": params[i] },
      ProjectionExpression: "User_ID, Org, first_name, last_name"
     };
     await dynamoClient.scan(scanParams, function(err, data) {
      if (err) {
        console.log("data retrival failed, error logged is :" + err);
        return err;
      }
      else {
        console.log("The " + i +" scan has completed successfully")
        //console.log("data retrival successful: " + JSON.stringify(data));
        userData = userData.concat(data.Items)
        //console.log("partial data structure is " + data)
      }
    }).promise();
  }
      responseData = JSON.stringify(userData)
      console.log("Complete response is " + responseData)
      console.log("data after execution scan is " + data)

I've tried to force the program to wait on the scan's competition by defining a wait and using AWS's .promise() function.我试图通过定义等待并使用 AWS 的 .promise() 函数来强制程序等待扫描的竞争。 However, these don't seem to be blocking the thread execution.但是,这些似乎并没有阻止线程执行。 I'm not sure exactly why its launching multiple scans though.我不确定为什么它会启动多次扫描。 The for loop isn't running more times than it should, so why is the search function getting called? for 循环的运行次数没有超出应有的次数,那么为什么要调用搜索函数呢?

Whenever you want to search something in your DynamoDB database it's recommended that you use the Query option instead of Scan每当您想在 DynamoDB 数据库中搜索某些内容时,建议您使用Query选项而不是Scan

This is because the Scan reads each and every item of the database whereas Query only looks for the mentioned Hask key (primary key).这是因为 Scan 读取数据库的每一项,而 Query 只查找提到的 Hask 键(主键)。

If you want to look for data with a particular "attribute" in your mind you can use Global Secondary Index wherein you can set the "attribute" as the Hash key and at the same time pick a Sort key of your choice.如果您想查找具有特定“属性”的数据,您可以使用全局二级索引,其中您可以将“属性”设置为哈希键,同时选择您选择的排序键。 This might solve your problem wherein the table is returning the answer multiple times.这可能会解决您的问题,其中表格多次返回答案。

Here is an example of how to use the DynamoDB DocumentClient to query multiple items by partition key and collect the results.下面是一个示例,说明如何使用 DynamoDB DocumentClient 按分区键查询多个项目并收集结果。 This uses the promisified variant of the query() call, and waits for all query promises to be fulfilled using Promise.all() .这使用了query()调用的 promisified 变体,并等待使用Promise.all()完成所有查询承诺。

var AWS = require('aws-sdk');
AWS.config.update({ region: 'us-east-1' });

const dc = new AWS.DynamoDB.DocumentClient();

// Array of organization IDs we want to query
const orgs = ['1', '2', '3'];

// Async function to query for one specific organization ID
const queryOrg = async org => {
  const params = {
    TableName: 'orgs',
    KeyConditionExpression: 'org = :o1',
    ExpressionAttributeValues: { ':o1': org, },
  };

  return dc.query(params).promise();
}

// Async IIFE because you cannot use await outside of an async function
(async () => {
  // Array of promises representing async organization queries made
  const promises = orgs.map(org => queryOrg(org));

  // Wait for all queries to complete and collect the results in an array
  const items = await Promise.all(promises);

  // Results are present in the same order that the queries were mapped
  for (const item of items) {
    console.log('Item:', item.Items[0]);
  }
})();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM