繁体   English   中英

我可以使用 Python 按键减少元组列表吗?

[英]Can I reduce a tuple list by key using Python?

我目前正在展示一些关于我的 NER 模型的表现的视觉效果。 我目前拥有的数据如下所示:

counter_list = [
    ('Name', {'p':0.56,'r':0.56,'f':0.56}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56}),
    ('Name', {'p':0.14,'r':0.14,'f':0.14}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56})
]

我想消除重复项并将它们各自的值添加到每种中的一种。 所以输出看起来像这样:

[
    ('Name', {'p':0.7,'r':0.7,'f':0.7}),
    ('Designation', {'p':0.2,'r':0.4,'f':0.28}),
    ('Location', {'p':1.12,'r':1.12,'f':1.12})
]

我曾尝试使用 reduce 函数,但它只给我“名称”条目的输出。

result = functools.reduce(lambda x, y: (x[0], Counter(x[1])+Counter(y[1])) if x[0]==y[0] else (x[0],x[1]), counter_list)

什么是正确的方法? 我正在尝试使用最终结果创建一些视觉效果,以确定哪个项目具有更高的“f”、“p”或“r”分量。

为什么不使用pandas及其~.groupby方法?

>>> import pandas as pd
>>> keys, data = zip(*counter_list)
>>> df = pd.DataFrame(data=data, index=keys).groupby(level=0).sum()
>>> df
                p     r     f
Designation  0.20  0.40  0.28
Location     1.12  1.12  1.12
Name         0.70  0.70  0.70

然后做

>>> list(df.T.to_dict().items())
[
    ('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}), 
    ('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12}), 
    ('Name', {'p': 0.7, 'r': 0.7, 'f': 0.7})
]

一种棘手的方法可能是使用 defaultdict 和 Counter,但其意图似乎有点不清楚:

from collections import defaultdict, Counter
result = defaultdict(Counter)
for item, values in counter_list:
    result[item].update(values)
print(result)

这可以使用中间/临时字典来完成。 像这样的东西:

counter_list = [
    ('Name', {'p':0.56,'r':0.56,'f':0.56}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56}),
    ('Name', {'p':0.14,'r':0.14,'f':0.14}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56})
]

tdict = dict()

for k, v in counter_list:
    if k not in tdict:
        tdict[k] = v
    else:
        for sk in 'prf':
            tdict[k][sk] += v[sk]

new_list = [(k, v) for k, v in tdict.items()]
print(new_list)

输出:

[('Name', {'p': 0.7000000000000001, 'r': 0.7000000000000001, 'f': 0.7000000000000001}), ('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}), ('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12})]

我的类似于上面的 Lancelot du Lac。 不是最干净的,但它得到了你预期的输出。

counter_list = [
    ('Name', {'p':0.56,'r':0.56,'f':0.56}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56}),
    ('Name', {'p':0.14,'r':0.14,'f':0.14}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56})
]

counter_intermediate = {x: {'p': 0, 'r': 0, 'f': 0} for x in list(set([tup[0] for tup in counter_list]))}

for (header, dic) in counter_list:
    for letter, value in dic.items():
        counter_intermediate[header][letter] += value

new_list = [(key, value) for key, value in counter_intermediate.items()]

print(new_list)

输出:

[('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}), ('Name', {'p': 0.7000000000000001, 'r': 0.7000000000000001, 'f': 0.7000000000000001}), ('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12})]

更实用的方法可能如下所示:

from typing import List, Tuple

def reduce_counter_list(counter_list: List[Tuple]) -> List[Tuple]:
    tmp_dict = {}
    for key, count_dict in counter_list:

        # First time seeing a key, store its value
        if key not in tmp_dict:
            tmp_dict[key] = count_dict

        # Otherwise, add the values to existing vals
        else:
            for k, v in count_dict.items():
                tmp_dict[key][k] += v

    # Finally, return tmp dict as a list of tuples
    return list(tmp_dict.items())

一个非常紧凑的例子,其中cl = counter_list用于空间:

l = len(cl)
y = []

for i, x in enumerate(cl[:l//2]):
    y.append((x[0], {k: round(x[1][k]+cl[i+3][1][k], 3) for k in x[1]}))
[
    ('Name', {'f': 0.7, 'p': 0.7, 'r': 0.7}),
    ('Designation', {'f': 0.28, 'p': 0.2, 'r': 0.4}),
    ('Location', {'f': 1.12, 'p': 1.12, 'r': 1.12})
]

这依赖于元组的统一排列,其顺序为:

[
    ('Name', ...),
    ('Designation', ...),
    ('Location', ...),
    ('Name', ...),
    ('Designation', ...),
    ('Location', ...)
]

您可以获取名称、名称、位置 (p,r,f) 各个列表并对值求和。

counter_list = [
    ('Name', {'p':0.56,'r':0.56,'f':0.56}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56}),
    ('Name', {'p':0.14,'r':0.14,'f':0.14}),
    ('Designation', {'p':0.10,'r':0.20,'f':0.14}),
    ('Location', {'p':0.56,'r':0.56,'f':0.56})
]

name_list = [(tuplee) for tuplee in counter_list if tuplee[0]=="Name"]
desi_list = [(tuplee) for tuplee in counter_list if tuplee[0]=="Designation"]
loca_list = [(tuplee) for tuplee in counter_list if tuplee[0]=="Location"]

name_p = sum([name[1]['p'] for name in name_list])
name_r = sum([name[1]['r'] for name in name_list])
name_f = sum([name[1]['f'] for name in name_list])

desi_p = sum([desi[1]['p'] for desi in desi_list])
desi_r = sum([desi[1]['r'] for desi in desi_list])
desi_f = sum([desi[1]['f'] for desi in desi_list])

loca_p = sum([loca[1]['p'] for loca in loca_list])
loca_r = sum([loca[1]['r'] for loca in loca_list])
loca_f = sum([loca[1]['f'] for loca in loca_list])

final = [('Name',        {'p': name_p, 'r': name_r, 'f': name_f}), \
         ('Designation', {'p': desi_p, 'r': desi_r, 'f': desi_f}), \
         ('Location',    {'p': loca_p, 'r': loca_r, 'f': loca_f})]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM