My code:
def merge_join(self, outer, outer_join_index, inner, inner_join_index):
a=list(inner)
b=list(outer)
if not a or not b:
return
inner_copy = sorted(a,key=lambda tup: tup[inner_join_index])
outer_copy = sorted(b,key=lambda tup: tup[outer_join_index])
inner_counter=0
outer_counter=0
while inner_counter < len(inner_copy) and outer_counter < len(outer_copy):
if outer_copy[outer_counter][outer_join_index]==inner_copy[inner_counter][inner_join_index]:
yield outer_copy[outer_counter]+inner_copy[inner_counter]
outer_counter+=1
elif outer_copy[outer_counter][outer_join_index]<inner_copy[inner_counter][inner_join_index]:
outer_counter+=1
else:
inner_counter+=1
Where outer and inner are generators.
I ran a given test for the algorithm but it returned a generator of 127 elements as opposed to the expected number 214. Can anyone help me check where the bug might be in my code? Thank you!!
If you want to pick a correct outer
row for each inner
row (without duplicates in inner
and skipping rows if there's no match then in case of match you are supposed to increment inner_counter
, not outer_counter
like you are doing.
The reason is that otherwise if multiple inner rows have the same value you will only output the first of them.
If instead you want to do a full join (producing all the cartesian product of rows from inner
and outer
for a given value of the join column) then this has to be coded explicitly with something like
while inner_counter < len(inner_copy) and outer_counter < len(outer_copy):
key = min(inner_copy[inner_index][inner_join_index],
outer_copy[outer_index][outer_join_index])
inner_group = []
while inner_index < len(inner) and key == inner_copy[inner_index][inner_join_index]:
inner_group.append(inner_copy[inner_index])
inner_index += 1
outer_group = []
while outer_index < len(outer) and key == outer_copy[outer_index][outer_join_index]:
outer_group.append(outer_copy[outer_index])
outer_index += 1
# Here you can handle left or right join by replacing an
# empty group with a group of one empty row (None,)*len(row)
for i in inner_group:
for o in outer_group:
yield i + o
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.