I have a huge set of data with three tables, assuming that three tables have data similar to :
Table A:
1 aaa place1
2 bbb place2
Table B:
1 11 aaa
2 22 bbb
Table C:
11 p1
22 p2
When I join Table A and B using hadoop mapreduce I get the output
1 aaa place1 11
2 bbb place2 22
Now I want to join Table C with the above output where I can replace 11 --> p1.How can I solve this problem ?
Probably the most easiest solution is to use Pig as @David mentioned. For a quick test you come up with something like this:
TABLE_A = LOAD 'hdfs://my_path/input/table_a.txt' using PigStorage(' ') AS (
id:chararray,
name:chararray,
place:chararray
);
TABLE_B = LOAD 'hdfs://my_path/input/table_b.txt' using PigStorage(' ') AS (
id:chararray,
cid:chararray,
name:chararray
);
TABLE_C = LOAD 'hdfs://my_path/input/table_c.txt' using PigStorage(' ') AS (
cid:chararray,
cname:chararray
);
TMP = FOREACH (join TABLE_A by id, TABLE_B by id) GENERATE
TABLE_A::id as id,
TABLE_A::name as name,
TABLE_A::place as place,
TABLE_B::cid as cid;
JOIN_ABC = FOREACH (join TMP by cid, TABLE_C by cid) GENERATE
TMP::id,
TMP::name,
TMP::place,
TABLE_C::cname;
store JOIN_ABC into 'hdfs://my_path/output' using PigStorage(' ');
The common algorithm if you want to join two datasets on map reduce is:
So if you understand how to join two dataset, you can repeat this operation to join with third.
Disadvantage of such approach is if one of your dataset is dictionary of small size the number of reducers on reduce stage will be limited to the size of that dictionary (actually they are limited by size of different keys space which doesn't exceed the size of the dictionary)
I do not think that you can in one MR step to join 3 tables. So I think you need simply another MR job which will take results of joined A,B and join them with C.
And a bit off - I would suggest using Hive or Pig for it before coding MR in Java.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.