简体   繁体   English

查询 50 GB 数据时哪些性能更​​好? 是带有条件的 MYSQL SELECT 还是带有过滤器表达式的 Dynamodb SCAN?

[英]What is better on Performance when Querying 50 GB data ? Is it MYSQL SELECT with a condition or Dynamodb SCAN with FiLTER Expressions?

I'm retrieving some traffic data of a website using "scan" option in Dynamodb.我正在使用 Dynamodb 中的“扫描”选项检索网站的一些流量数据。 I have used filterExpression to filter those out.我已经使用 filterExpression 来过滤掉那些。 I will be doing scanning against a large table which will have more than 20GB of data.我将对一个包含超过 20GB 数据的大表进行扫描。

I found that DynamoDB scans throguh the entire table and filter the results out.我发现 DynamoDB 扫描整个表并过滤结果。 The document says it only returns 1MB of data and then i have to loop through again to get the rest.该文档说它只返回 1MB 的数据,然后我必须再次循环以获取其余数据。 It seems to be bad way to make this work.使这项工作似乎是不好的方式。 got the reference from here: Dynamodb filter expression not returning all results从这里得到参考: Dynamodb 过滤器表达式不返回所有结果

For a small table that should be fine.对于一张应该没问题的小桌子。

MySQL dose the same I guess. MySQL的剂量我猜是一样的。 I'm not sure.我不知道。

Which is faster to read is it MySQL select or DynamoDB scan on a large set of data.哪个读取速度更快,是对大量数据进行 MySQL 选择还是 DynamoDB 扫描。 ? ?

Is there any other alternative?还有其他选择吗? what are your thoughts and suggestions?你有什么想法和建议?

I'm trying to migrate those traffic data into Dynamodb table and then query it out.我正在尝试将这些流量数据迁移到 Dynamodb 表中,然后将其查询出来。 It seems like a bad idea to me now.现在对我来说似乎是个坏主意。

$params = [
      'TableName' => $tableName,
      'FilterExpression' => $this->filter.'=:'.$this->filter.' AND #dy > :since AND #dy < :now',
      'ExpressionAttributeNames'=> [ '#dy' => 'day' ],
      'ExpressionAttributeValues'=> $eav
    ];

    var_dump($params);

    try {
      $result = $dynamodb->scan($params);

After considering the suggestion this is what worked for me在考虑了这个建议后,这对我有用

$params = [ 
'TableName' => $tableName,
 'IndexName' => self::GLOBAL_SECONDARY_INDEX_NAME, 
'ProjectionExpression' => '#dy, t_counter , traffic_type_id', 'KeyConditionExpression' => 'country=:country AND #dy between :since AND :to', 
'FilterExpression' => 'traffic_type_id=:traffic_type_id' 'ExpressionAttributeNames' => ['#dy' => 'day'],
'ExpressionAttributeValues' => $eav 
]; 

If your data is like Key-Value pair and you have fixed fields on which you want to index, use DynamoDB - you can create indexes on all fields you want to query and it will work great如果您的数据类似于键值对,并且您有要为其建立索引的固定字段,请使用DynamoDB - 您可以在要查询的所有字段上创建索引,它会很好用

If you require complex querying on multiple indexes, then any RDBMS is good.如果您需要对多个索引进行复杂查询,那么任何RDBMS都很好。

If you can query on just about anything, think about Elastic search如果您可以查询任何内容,请考虑弹性搜索

If your queries are very simple, but you have large data to be retrieved in each query.如果您的查询非常简单,但您在每个查询中都需要检索大量数据。 Think about S3 .想想S3 Maybe you can index metadata in DynamoDb and actual data can be in S3也许您可以在 DynamoDb 中索引元数据,而实际数据可以在 S3 中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM