简体   繁体   中英

How to merge two tuples in a list if the first tuple elements match?

I've got two lists of tuples of the form:

playerinfo = [(ansonca01,4,1871,1,RC1),(forceda01,44,1871,1,WS3),(mathebo01,68,1871,1,FW1)]

idmatch = [(ansonca01,Anson,Cap,05/06/1871),(aaroh101,Aaron,Hank,04/13/1954),(aarot101,Aaron,Tommie,04/10/1962)]

What I would like to know, is how could I iterate through both lists, and if the first element in a tuple from "playerinfo" matches the first element in a tuple from "idmatch", merge the matching tuples together to yield a new list of tuples? In the form:

merged_data = [(ansonca01,4,1871,1,RC1, Anson,Cap,05/06/1871),(...),(...), etc.] 

The new list of tuples would have the ID number matched to the first and last names of the correct player.

Background info: I'm trying to merge two CSV documents of baseball statistics, but the one with all of the relevant stats doesn't contain player names, only a reference number eg 'ansoc101', while the second document contains the reference number in one column and the first and last names of the corresponding player in the other.

The size of the CSV is too large to do this manually (about 20,000 players), so I'm trying to automate the process.

Use a list comprehension to iterate over your lists:

[x + y[1:] for x in list1 for y in list2 if x[0] == y[0]]

I tried this on the lists:

list1 = [("this", 1, 2, 3), ("that", 1, 2, 3), ("other", 1, 2, 3)]
list2 = [("this", 5, 6, 7), ("that", 10, 11, 12), ("notother", 1, 2, 3)]

and got:

[('this', 1, 2, 3, 5, 6, 7), ('that', 1, 2, 3, 10, 11, 12)]

Is that what you wanted?

You could first create a dictionary to enable fast ID number look-ups, and then merge the data from the two lists together very efficiently with a list comprehension:

import operator

playerinfo = [('ansonca01', 4, 1871, 1, 'RC1'),
              ('forceda01', 44, 1871, 1, 'WS3'),
              ('mathebo01', 68, 1871, 1, 'FW1')]

idmatch = [('ansonca01', 'Anson', 'Cap', '05/06/1871'),
           ('aaroh101', 'Aaron', 'Hank', '04/13/1954'),
           ('aarot101', 'Aaron', 'Tommie', '04/10/1962')]

id = operator.itemgetter(0)  # To get id field.

idinfo = {id(rec): rec[1:] for rec in idmatch}  # Dict for fast look-ups.

merged = [info + idinfo[id(info)] for info in playerinfo if id(info) in idinfo]

print(merged) # -> [('ansonca01', 4, 1871, 1, 'RC1', 'Anson', 'Cap', '05/06/1871')]

Dictionary

  1. Iterate on playerinfo list and create dictionary where key is first item from the tuple and value is list of all items.
  2. Print result of first step.
  3. Again iterate on idmatch list and check first item of tuple in the result dictionary or not. If It is present then extend value of key with new values by list extend method.
  4. Print result of second step.
  5. Create output format from the generated dictionary.

Demo:

import pprint

playerinfo = [("ansonca01",4,1871,1,"RC1"),\
              ("forceda01",44,1871,1,"WS3"),\
              ("mathebo01",68,1871,1,"FW1")]

idmatch = [("ansonca01","Anson","Cap","05/06/1871"),\
           ("aaroh101","Aaron","Hank","04/13/1954"),\
           ("aarot101","Aaron","Tommie","04/10/1962")]

result = {}
for i in playerinfo:
    result[i[0]] =  list(i[:])

print "Debug Rsult1:"
pprint.pprint(result)

for i in idmatch:
    if i[0] in result:
        result[i[0]].extend(list(i[1:])) 

print "\nDebug Rsult2:"
pprint.pprint(result)

final_rs = []
for i,j in result.items():
    final_rs.append(tuple(j))

print "\nFinal result:"

pprint.pprint(final_rs)

Output:

infogrid@infogrid-vivek:~/workspace/vtestproject$ python task4.py 
Debug Rsult1:
{'ansonca01': ['ansonca01', 4, 1871, 1, 'RC1'],
 'forceda01': ['forceda01', 44, 1871, 1, 'WS3'],
 'mathebo01': ['mathebo01', 68, 1871, 1, 'FW1']}

Debug Rsult2:
{'ansonca01': ['ansonca01', 4, 1871, 1, 'RC1', 'Anson', 'Cap', '05/06/1871'],
 'forceda01': ['forceda01', 44, 1871, 1, 'WS3'],
 'mathebo01': ['mathebo01', 68, 1871, 1, 'FW1']}

Final result:
[('ansonca01', 4, 1871, 1, 'RC1', 'Anson', 'Cap', '05/06/1871'),
 ('forceda01', 44, 1871, 1, 'WS3'),
 ('mathebo01', 68, 1871, 1, 'FW1')]
infogrid@infogrid-vivek:~/workspace/vtestproject$ 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM