[英]Python: Compare lists and combine in common fields
我有以下两个列表:
ISO3166_CountryCodes_NO = [["NO","Norge"],["SE","Sverige"],["GR","Hellas"]]
ISO3166_CountryCodes_EN = [["NO","Norway"],["SE","Sweden"],["GR","Greece"]]
如您所见,国家/地区代码始终相同,但是国家/地区名称不同(翻译不同)如何创建这样的一个列表:
ISO3166_CountryCodes = [["NO","Norge","Norway"],["SE","Sverige","Sweden"],["GR","Hellas","Greece"]]
我可以在第一个列表中使用for循环来完成此操作,对于每个元素,我都可以搜索第二个元素以查找常见的国家/地区代码。 然后将翻译内容附加到新列表中,但我觉得这种方式有些笨拙。
有没有更好的方法可以在Python中实现呢? 例如,在我更熟悉的Perl中,我将使用哈希表。
在python中,字典是哈希表。 首先,创建两个字典:
NO_dict = {x[0]: x[1] for x in ISO3166_CountryCodes_NO}
EN_dict = {x[0]: x[1] for x in ISO3166_CountryCodes_EN}
这给你:
{'GR': 'Hellas', 'NO': 'Norge', 'SE': 'Sverige'}
{'GR': 'Greece', 'NO': 'Norway', 'SE': 'Sweden'}
然后,您可以像这样创建一个列表:
final_list = [[k, NO_dict[k], EN_dict[k]] for k in NO_dict]
给你:
[['GR', 'Hellas', 'Greece'],
['SE', 'Sverige', 'Sweden'],
['NO', 'Norge', 'Norway']]
您稍后可能会发现将数据保存在具有元组名称的字典中更为容易,例如:
final_dict = {k:(NO_dict[k], EN_dict[k]) for k in NO_dict}
这样您就可以使用缩写作为键来获取项目,例如final_dict['NO']
将产生('Norge', 'Norway')
编辑:OrderedDict
如果您的python> = 2.7,并且担心顺序,仍然可以通过使用OrderedDict
使用字典,例如:
from collections import OrderedDict
# A list of lists can be used as input for an OrderedDict, so don't need to loop
NO_dict = OrderedDict(ISO3166_CountryCodes_NO)
EN_dict = OrderedDict(ISO3166_CountryCodes_EN)
# Assumes you want the result in the same order as the Norwegian list
# Iterate over the English list if it has a preferred order
final_dict = OrderedDict([(k, (NO_dict[k], EN_dict[k])) for k in NO_dict])
(有关另一种实现,请参见AshwiniChaudhary的答案)
这样的事情,使用itertools配方和chain()
unique_everseen
:
In [26]: from itertools import *
In [27]: lis1=[["NO","Norge"],["SE","Sverige"],["GR","Hellas"]]
In [28]: lis2=[["NO","Norway"],["SE","Sweden"],["GR","Greece"]]
In [29]: from itertools import *
In [30]: def unique_everseen(iterable, key=None):
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
....:
In [31]: [list(unique_everseen(chain(*x))) for x in izip(lis1,lis2)]
Out[31]:
[['NO', 'Norge', 'Norway'],
['SE', 'Sverige', 'Sweden'],
['GR', 'Hellas', 'Greece']]
或:您可以将itertools中的groupby
与operator.itemgetter()
结合使用:
In [42]: from operator import *
In [43]: [[k]+list(map(itemgetter(1),g)) for x in zip(lis1,lis2) for k,g in groupby(x,itemgetter(0))]
Out[43]:
[['NO', 'Norge', 'Norway'],
['SE', 'Sverige', 'Sweden'],
['GR', 'Hellas', 'Greece']]
或使用collections.OrderedDict
,它是dict
的子类,还维护顺序:
In [47]: from collections import OrderedDict
In [48]: dic=OrderedDict()
In [49]: for x in lis1:
....: dic.setdefault(x[0],[]).append(x[1])
....:
In [50]: for x in lis2:
dic.setdefault(x[0],[]).append(x[1])
....:
In [51]: dic
Out[51]: OrderedDict([('NO', ['Norge', 'Norway']), ('SE', ['Sverige', 'Sweden']), ('GR', ['Hellas', 'Greece'])])
In [52]: [[x]+y for x,y in dic.items()]
Out[52]:
[['NO', 'Norge', 'Norway'],
['SE', 'Sverige', 'Sweden'],
['GR', 'Hellas', 'Greece']]
#or directly access the names using the short-name
In [53]: dic['NO']
Out[53]: ['Norge', 'Norway']
In [54]: dic['GR']
Out[54]: ['Hellas', 'Greece']
您可以使用列表理解:
>>> [[s]+
[n for (c,n) in ISO3166_CountryCodes_NO if c==s]+
[n for (c,n) in ISO3166_CountryCodes_EN if c==s]
for s in set([c for (c,n) in ISO3166_CountryCodes_NO] +
[c for (c,n) in ISO3166_CountryCodes_EN])]
[['GR', 'Hellas', 'Greece'], ['SE', 'Sverige', 'Sweden'], ['NO', 'Norge', 'Norway']]
使用Python 3.2。
第一种方式:
[[i[0],i[1],v[1]] for i in list1 for v in list2 if i[0]==v[0]]
第二种方式:
res=[]
for i,v in list(zip(list1,list2):
tem=[i[0]]
if i[0]==v[0]: tem.extend([i[1],v[1]])
res.append(tem)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.