[英]Can I reduce a tuple list by key using Python?
我目前正在展示一些关于我的 NER 模型的表现的视觉效果。 我目前拥有的数据如下所示:
counter_list = [
('Name', {'p':0.56,'r':0.56,'f':0.56}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56}),
('Name', {'p':0.14,'r':0.14,'f':0.14}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56})
]
我想消除重复项并将它们各自的值添加到每种中的一种。 所以输出看起来像这样:
[
('Name', {'p':0.7,'r':0.7,'f':0.7}),
('Designation', {'p':0.2,'r':0.4,'f':0.28}),
('Location', {'p':1.12,'r':1.12,'f':1.12})
]
我曾尝试使用 reduce 函数,但它只给我“名称”条目的输出。
result = functools.reduce(lambda x, y: (x[0], Counter(x[1])+Counter(y[1])) if x[0]==y[0] else (x[0],x[1]), counter_list)
什么是正确的方法? 我正在尝试使用最终结果创建一些视觉效果,以确定哪个项目具有更高的“f”、“p”或“r”分量。
>>> import pandas as pd
>>> keys, data = zip(*counter_list)
>>> df = pd.DataFrame(data=data, index=keys).groupby(level=0).sum()
>>> df
p r f
Designation 0.20 0.40 0.28
Location 1.12 1.12 1.12
Name 0.70 0.70 0.70
然后做
>>> list(df.T.to_dict().items())
[
('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}),
('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12}),
('Name', {'p': 0.7, 'r': 0.7, 'f': 0.7})
]
一种棘手的方法可能是使用 defaultdict 和 Counter,但其意图似乎有点不清楚:
from collections import defaultdict, Counter
result = defaultdict(Counter)
for item, values in counter_list:
result[item].update(values)
print(result)
这可以使用中间/临时字典来完成。 像这样的东西:
counter_list = [
('Name', {'p':0.56,'r':0.56,'f':0.56}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56}),
('Name', {'p':0.14,'r':0.14,'f':0.14}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56})
]
tdict = dict()
for k, v in counter_list:
if k not in tdict:
tdict[k] = v
else:
for sk in 'prf':
tdict[k][sk] += v[sk]
new_list = [(k, v) for k, v in tdict.items()]
print(new_list)
输出:
[('Name', {'p': 0.7000000000000001, 'r': 0.7000000000000001, 'f': 0.7000000000000001}), ('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}), ('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12})]
我的类似于上面的 Lancelot du Lac。 不是最干净的,但它得到了你预期的输出。
counter_list = [
('Name', {'p':0.56,'r':0.56,'f':0.56}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56}),
('Name', {'p':0.14,'r':0.14,'f':0.14}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56})
]
counter_intermediate = {x: {'p': 0, 'r': 0, 'f': 0} for x in list(set([tup[0] for tup in counter_list]))}
for (header, dic) in counter_list:
for letter, value in dic.items():
counter_intermediate[header][letter] += value
new_list = [(key, value) for key, value in counter_intermediate.items()]
print(new_list)
输出:
[('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}), ('Name', {'p': 0.7000000000000001, 'r': 0.7000000000000001, 'f': 0.7000000000000001}), ('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12})]
更实用的方法可能如下所示:
from typing import List, Tuple
def reduce_counter_list(counter_list: List[Tuple]) -> List[Tuple]:
tmp_dict = {}
for key, count_dict in counter_list:
# First time seeing a key, store its value
if key not in tmp_dict:
tmp_dict[key] = count_dict
# Otherwise, add the values to existing vals
else:
for k, v in count_dict.items():
tmp_dict[key][k] += v
# Finally, return tmp dict as a list of tuples
return list(tmp_dict.items())
一个非常紧凑的例子,其中cl
= counter_list
用于空间:
l = len(cl)
y = []
for i, x in enumerate(cl[:l//2]):
y.append((x[0], {k: round(x[1][k]+cl[i+3][1][k], 3) for k in x[1]}))
[
('Name', {'f': 0.7, 'p': 0.7, 'r': 0.7}),
('Designation', {'f': 0.28, 'p': 0.2, 'r': 0.4}),
('Location', {'f': 1.12, 'p': 1.12, 'r': 1.12})
]
这依赖于元组的统一排列,其顺序为:
[
('Name', ...),
('Designation', ...),
('Location', ...),
('Name', ...),
('Designation', ...),
('Location', ...)
]
您可以获取名称、名称、位置 (p,r,f) 各个列表并对值求和。
counter_list = [
('Name', {'p':0.56,'r':0.56,'f':0.56}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56}),
('Name', {'p':0.14,'r':0.14,'f':0.14}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56})
]
name_list = [(tuplee) for tuplee in counter_list if tuplee[0]=="Name"]
desi_list = [(tuplee) for tuplee in counter_list if tuplee[0]=="Designation"]
loca_list = [(tuplee) for tuplee in counter_list if tuplee[0]=="Location"]
name_p = sum([name[1]['p'] for name in name_list])
name_r = sum([name[1]['r'] for name in name_list])
name_f = sum([name[1]['f'] for name in name_list])
desi_p = sum([desi[1]['p'] for desi in desi_list])
desi_r = sum([desi[1]['r'] for desi in desi_list])
desi_f = sum([desi[1]['f'] for desi in desi_list])
loca_p = sum([loca[1]['p'] for loca in loca_list])
loca_r = sum([loca[1]['r'] for loca in loca_list])
loca_f = sum([loca[1]['f'] for loca in loca_list])
final = [('Name', {'p': name_p, 'r': name_r, 'f': name_f}), \
('Designation', {'p': desi_p, 'r': desi_r, 'f': desi_f}), \
('Location', {'p': loca_p, 'r': loca_r, 'f': loca_f})]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.