简体繁体中英

Data set join using EMR

原文 2013-05-06 18:44:05 5 1 join/ hadoop/ amazon-web-services/ emr

I have 2 tab-delimited datasets stored in AWS S3. I am trying to write an EMR job that will join these 2 datasets based on a common key (a set of field values). My current version populates 2 lists and compares them line by line; outputting the rows that have a common key. I have been writing in python but cannot seem to figure out the logic behind bringing 2 files through stdin and comparing each row with one another in order to join the two datasets. Most of the documentation I find is in Java. I am using Amazon's EMR to run all my jobs. Any help is greatly appreciated.

thank you

1 answers

As you are using EMR already, have you looked at Hive?

http://aws.amazon.com/articles/Elastic-MapReduce/3681655242374956

Using SQL join to retrieve set of data for each different value in a column

Get data using JOIN

Spatial Join Query Optimization on Large Data Set

using SQL JOIN for union of data

Using a Join with Grouped Data Rows

Using JOIN to display data in a table

Conditional JOIN using field data

Using LIKE in JOIN query if values for join are in specfic set

Postgresql Update using Inner Join set

Join (or merge) data sets based on 2 variables in second data set

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using SQL join to retrieve set of data for each different value in a column Get data using JOIN Spatial Join Query Optimization on Large Data Set using SQL JOIN for union of data Using a Join with Grouped Data Rows Using JOIN to display data in a table Conditional JOIN using field data Using LIKE in JOIN query if values for join are in specfic set Postgresql Update using Inner Join set Join (or merge) data sets based on 2 variables in second data set

Related Tags

Data set join using EMR

Question

1 answers

solution1 0 2013-06-09 10:37:21

solution1
0 2013-06-09 10:37:21