简体   繁体   中英

DynamoDb scan filter not returning results for some requests

I have a table with two columns FirstId and SecondId. FirstId is the primary key and SecondId is not indexed .

FirstId |  SecondId    
--------------------
  abc   |  123     
  xyz   |  789     

I'm doing a scan filter to get the FirstId value from the SecondId using the JavaSDK. I've around 12k entries and it was working fine. Recently, the scan request has started returning null for some cases, although, I can find the entry in the AWS UI .

Here is my code

    Condition scanFilterCondition = new Condition()
        .withComparisonOperator(ComparisonOperator.EQ)
        .withAttributeValueList(new AttributeValue().withS(secondIdValue));
    
    Map<String, Condition> conditions = new HashMap<String, Condition>();
    conditions.put("SecondId", scanFilterCondition);

    ScanRequest scanRequest = new ScanRequest()
            .withTableName(table)
            .withScanFilter(conditions);
    
    ScanResult result = mDBClient.scan(scanRequest);
    if(result.getItems().size() == 0) {
        return null;
    }
        
    Map<String, AttributeValue> item = result.getItems().get(0);
    
    return item.get("FirstId").getS();

I'm assuming this might be due to the operation getting expensive as the data grows! Is there a way I can optimize this request? Or, is there something that I'm missing?

Your problem might be explained by understanding how DynamoDB handles scans, filters and pagination.

A few things to remember:

  • Each scan can only read up to 1MB of data at a time
  • Filter operations are applied after the scan (or query )
  • Scan will search the entire database

Taking these into consideration, it's possible you will have to paginate through several empty scan results before finding (or not finding) the data you are after. You could be reading a full 1MB of data, filtering it all out, and returning an empty result set to the client. result.getItems().size() may be zero, but that doesn't mean that you don't have to perform another scan operation to complete the search.

I'm not certain that this is your problem, but it's easy to verify. DDB will return a field LastEvaluatedKey with your scan results if the scan operation is being paginated. If so, you'll need to make another scan request to continue the scan operation.

Again, this may not be your problem, but it could explain why you aren't seeing the results you expect from a single scan operation.

I had a long debugging session with the AWS support engineers and still, we could not figure out the reason. AWS CLI and UI both were returning the result, but not the JAVA SDK scan query.

So, I created a GSI for SecondId and changed my code to do index-query rather than a scan query, and it fixed the problem for me.

    Table table = dynamoDB.getTable(table);
    Index index = table.getIndex("SecondId-index");

    QuerySpec spec = new QuerySpec()
            .withKeyConditionExpression("SecondId = :second_id")
            .withValueMap( new ValueMap()
                .withString(":second_id", secondIdValue));

    ItemCollection<QueryOutcome> items = index.query(spec);

    if(!items.iterator().hasNext()) {
        return null;
    }
    
    JsonNode result = jsonMapper.readTree(items.iterator().next().toJSON());

    return result.get("FirstId").asText();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM