简体   繁体   English

使用PHP在AWS DynamoDB中使用JOIN查询

[英]Using JOIN query in AWS DynamoDB using PHP

I am currently using MySQL as database for my application in PHP. 我目前正在使用MySQL作为我的PHP应用程序的数据库。 But now need to migrate to AWS DynamoDB. 但现在需要迁移到AWS DynamoDB。 As I am new to DynamoDB, can anyone help me using JOIN in DynamoDB? 由于我是DynamoDB的新手,任何人都可以帮我在DynamoDB中使用JOIN吗?

As per my finding, I have found that, JOINs can be used using Hive and Amazon EMR. 根据我的发现,我发现,可以使用Hive和Amazon EMR来使用JOIN。 But here also there is a problem that no resource is available for using Hive with PHP. 但是这里也存在一个问题,即没有资源可用于使用Hive和PHP。

hi maybe you can try this 嗨,也许你可以试试这个

To join two DynamoDB tables The join is computed on the cluster and returned. 连接两个DynamoDB表在集群上计算连接并返回。 The join does not take place in DynamoDB. 连接不在DynamoDB中进行。 This example returns a list of customers and their purchases for customers that have placed more than two orders. 此示例返回已放置两个以上订单的客户的客户列表及其购买。

CREATE EXTERNAL TABLE hive_purchases(customerId bigint, total_cost double, items_purchased array<String>) 
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Purchases",
"dynamodb.column.mapping" = "customerId:CustomerId,total_cost:Cost,items_purchased:Items");

CREATE EXTERNAL TABLE hive_customers(customerId bigint, customerName string, customerAddress array<String>) 
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Customers",
"dynamodb.column.mapping" = "customerId:CustomerId,customerName:Name,customerAddress:Address");

Select c.customerId, c.customerName, count(*) as count from hive_customers c 
JOIN hive_purchases p ON c.customerId=p.customerId 
GROUP BY c.customerId, c.customerName HAVING count > 2;

To join two tables from different sources 从不同来源连接两个表

In the following example, Customer_S3 is a Hive table that loads a CSV file stored in Amazon S3 and hive_purchases is a table that references data in DynamoDB. 在以下示例中,Customer_S3是一个Hive表,用于加载存储在Amazon S3中的CSV文件,而hive_purchases是一个引用DynamoDB中数据的表。 The following example joins together customer data stored as a CSV file in Amazon S3 with order data stored in DynamoDB to return a set of data that represents orders placed by customers who have "Miller" in their name. 以下示例将存储为Amazon S3中的CSV文件的客户数据与存储在DynamoDB中的订单数据连接在一起,以返回一组数据,这些数据表示在其名称中包含“Miller”的客户所下的订单。

CREATE EXTERNAL TABLE hive_purchases(customerId bigint, total_cost double, items_purchased array) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "Purchases", "dynamodb.column.mapping" = "customerId:CustomerId,total_cost:Cost,items_purchased:Items"); CREATE EXTERNAL TABLE hive_purchases(customerId bigint,total_cost double,items_purchased array)STORED BY'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'TBLPROPERTIES(“dynamodb.table.name”=“Purchases”,“dynamodb.column.mapping” =“customerId:CustomerId,total_cost:Cost,items_purchased:Items”);

CREATE EXTERNAL TABLE Customer_S3(customerId bigint, customerName string, customerAddress array<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LOCATION 's3://bucketname/path/subpath/';

Select c.customerId, c.customerName, c.customerAddress from 
Customer_S3 c 
JOIN hive_purchases p 
ON c.customerid=p.customerid 
where c.customerName like '%Miller%';

for more information you can read the documentation DynamoDB Export , Import Querys 有关更多信息,请阅读文档DynamoDB导出,导入查询

good luck and try 祝你好运并尝试

好吧,将sql迁移到NoSQL是一个很难的决定,你可能想看一下这个白页 ,看看你的应用程序是否可以在NoSQL世界中生存。

Are you after this for data migration purposes? 您是否在此之后进行数据迁移? Or for your app? 或者你的应用程序?

Do you have an example of the data you're trying to join? 您是否有想要加入的数据示例? Data modelling differs dramatically between SQL and NoSQL databases so as @AndrewTempleton said, you may need to denormalize your data. SQL和NoSQL数据库之间的数据建模差异很大,因此@AndrewTempleton说,您可能需要对数据进行非规范化。 One of the keys to modelling with DynamoDB is understanding the access patterns for your data. 使用DynamoDB进行建模的关键之一是了解数据的访问模式。 Couple this with the logical structure of your data and you can begin to model it effectively. 将此与数据的逻辑结构相结合,您就可以开始有效地对其进行建模。

If it's for your app, you may be able to create a single table and nest your joined table inside your parent table - so no need to join anything. 如果它适用于您的应用程序,您可以创建一个表并将已连接的表嵌套在父表中 - 因此无需加入任何内容。

If you continue to have two tables, there's no referential integrity, unless you build it yourself. 如果你继续有两个表,那么就没有参照完整性,除非你自己构建它。 If you want to join the two tables, you need to do that programmatically - an outer loop of GetItem calls (or BatchGetItem) for your parent and an inner loop of GetItem calls for your child. 如果要加入这两个表,则需要以编程方式执行此操作 - 为父级调用GetItem调用(或BatchGetItem)的外部循环以及为子级调用GetItem调用的内部循环。

Alternatively, you can keep the two tables and use DynamoDB streams and build a denormalized "view" of the two tables. 或者,您可以保留两个表并使用DynamoDB流并构建两个表的非规范化“视图”。 Some considerations around consistency need to the thought about. 关于一致性的一些考虑需要考虑。

So, in essence, a join in DynamoDB is just a couple of loops. 因此,从本质上讲,DynamoDB中的连接只是几个循环。 It's very different thinking. 这是非常不同的想法。

If you want to continue in the RDBMS world, have you considered RDS for MySQL. 如果您想继续在RDBMS世界中,您是否考虑过RDS for MySQL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM