没有顺序的两列中唯一相同值的百分比 pandas

Question

I have a dataframe with我有一个 dataframe

agent_id ts pred gt
0      0  0    0
0      1  0    0
0      2  0    1
0      3  1    0
1      0  0    0
1      1  1    0
1      2  2    1
1      3  3    0

agent_id and ts are indices and pred and gt are columns. agent_id 和 ts 是索引， pred 和 gt 是列。

Now I want to:现在我想：

Group By agent_id按 agent_id 分组
Get the percentage of same unique values in both columns without caring for order在不关心顺序的情况下获取两列中相同唯一值的百分比

I've alreay implemented a similar metric, where the order matters:我已经实现了一个类似的指标，其中顺序很重要：

grouped_df.apply(lambda df: df.gt.eq(df.pred).mean()).to_dict()

and I've also implement the metric that I want assuming pred and gt would be normal lists without any grouping:而且我还实现了我想要假设pred和gt将是没有任何分组的普通列表的指标：

unordered_matches = len(set(pred) & set(gt)) / len(set(pred) | set(gt))

How can I achieve this now with the grouping in pandas (and converting to dict ideally)?我现在如何通过 pandas 中的分组来实现这一点（理想情况下转换为 dict ）？

Just for better understanding, here how the results would be for the sample data above:为了更好地理解，以下示例数据的结果如何：

agent 0:代理 0：

set(pred) -> {0, 1};设置（预测）-> {0, 1}; set(gt) -> {0, 1}设置（gt）-> {0, 1}
unordered_matches = 1 (100%) unordered_matches = 1 (100%)

agent 1:代理 1：

set(pred) -> {0, 1, 2, 3};集合（预测）-> {0, 1, 2, 3}； set(gt) -> {0, 1}设置（gt）-> {0, 1}
unordered_matches = 0.5 (50%) unordered_matches = 0.5 (50%)

I would be interested in a problem specific pandas solution as well as a more generic solution how I can translate my python formulas so they work with panda grouping.我会对特定问题的 pandas 解决方案以及如何翻译我的 python 公式以便它们与熊猫分组一起使用的更通用的解决方案感兴趣。

Answer 1

Using set operations and groupby.apply :使用集合操作和groupby.apply ：

(df.groupby('agent_id')
   .apply(lambda x: len((S1:=set(x['pred'])) & (S2:=set(x['gt'])))/len(S1|S2))
)

output: output：

agent_id
0    1.0
1    0.5
dtype: float64

As dictionary, add .to_dict() : {0: 1.0, 1: 0.5}作为字典，添加.to_dict() : {0: 1.0, 1: 0.5}

没有顺序的两列中唯一相同值的百分比 pandas

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-17 11:11:04

没有顺序的两列中唯一相同值的百分比 pandas

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-17 11:11:04

解决方案1
1 已采纳 2022-08-17 11:11:04