如何检索关键子字符串并以此子字符串计数？

Question

I have the following dictionary in Python: 我在Python中有以下字典：

OrderedDict([('data(xxx_a1)_first_type', 0.12),
             ('data(xxx_a2)_first_type', 0.14),
             ('test(xx_b15)_second_type', 0.15)])

Is there any way to count first_type and second_type , and calculate the average value per type? 有什么方法可以计算first_type和second_type ，并计算每种类型的平均值？

The expected result: 预期结果：

type         avg_val
first_type   0.13
second_type  0.15

Answer 1

import pandas as pd
list_Tuples = [(z, np.mean([y for x,y in v.items() if x.endswith(z)]), len([y for x,y in v.items() if x.endswith(z)])) for z in ['first_type', 'second_type']]
pd.DataFrame(list_Tuples, columns=['type', 'avg_val', 'count'])

Output: 输出：

    type         avg_val  count
0   first_type   0.13     2
1   second_type  0.15     1

where v is the data. 其中v是数据。

Answer 2

Assuming there are only two types (otherwise use a dict to store the lists by type) : 假设只有两种类型（否则使用字典按类型存储列表）：

from collections import OrderedDict
from statistics import mean

data = OrderedDict([('data(xxx_a1)_first_type', 0.12),
                    ('data(xxx_a2)_first_type', 0.14),
                    ('test(xx_b15)_second_type', 0.15)])


firsts = []
seconds = []
for key, value in data.items():
    if key.endswith("first_type"):
        firsts.append(value)
    else:
        seconds.append(value)

print("type", "avg_value", sep="\t\t")
print("first_type", mean(firsts), sep='\t')
print("second_type", mean(seconds), sep='\t')

Answer 3

Using itertools.groupby assuming the data is ordered. 通过假定数据已排序来使用itertools.groupby 。

Ex: 例如：

from collections import OrderedDict
from itertools import groupby

d = OrderedDict([('data(xxx_a1)_first_type', 0.12),
             ('data(xxx_a2)_first_type', 0.14),
             ('test(xx_b15)_second_type', 0.15)])

for k, v in groupby(d.items(), lambda x: "_".join(x[0].split("_")[-2:])):
    val = [i for _, i in v]
    print("{} {}".format(k, sum(val)/len(val)))

Output: 输出：

first_type 0.13
second_type 0.15

Or using dict.setdefault 或者使用dict.setdefault

Ex: 例如：

result = {}
for k, v in d.items():
    key = "_".join(k.split("_")[-2:])
    result.setdefault(key, []).append(v)

for k, v in result.items():
    print("{} {}".format(k, sum(v)/len(v)))

Answer 4

You can use a collections.defaultdict to group the values, then apply statistics.mean to get the average: 您可以使用collections.defaultdict将值分组，然后应用statistics.mean获得平均值：

from collections import defaultdict
from collections import OrderedDict
from statistics import mean

data = OrderedDict([('data(xxx_a1)_first_type', 0.12),
                    ('data(xxx_a2)_first_type', 0.14),
                    ('test(xx_b15)_second_type', 0.15)])

d = defaultdict(list)
for k, v in data.items():
    *_, key = k.split('_', 2)
    d[key].append(v)

for k, v in d.items():
    print('%s %.2f' % (k, mean(v)))

Output: 输出：

first_type 0.13
second_type 0.15

如何检索关键子字符串并以此子字符串计数？

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-07-26 12:36:45

解决方案2
1 2019-07-26 12:34:37

解决方案3
1 2019-07-26 12:38:51

解决方案4
0 2019-07-26 12:56:27

如何检索关键子字符串并以此子字符串计数？

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-07-26 12:36:45

解决方案2 1 2019-07-26 12:34:37

解决方案3 1 2019-07-26 12:38:51

解决方案4 0 2019-07-26 12:56:27

解决方案1
2 已采纳 2019-07-26 12:36:45

解决方案2
1 2019-07-26 12:34:37

解决方案3
1 2019-07-26 12:38:51

解决方案4
0 2019-07-26 12:56:27