[英]Percentage of unique same values in two columns without order pandas
I have a dataframe with我有一个 dataframe
agent_id ts pred gt
0 0 0 0
0 1 0 0
0 2 0 1
0 3 1 0
1 0 0 0
1 1 1 0
1 2 2 1
1 3 3 0
agent_id and ts are indices and pred and gt are columns. agent_id 和 ts 是索引, pred 和 gt 是列。
Now I want to:现在我想:
I've alreay implemented a similar metric, where the order matters:我已经实现了一个类似的指标,其中顺序很重要:
grouped_df.apply(lambda df: df.gt.eq(df.pred).mean()).to_dict()
and I've also implement the metric that I want assuming pred
and gt
would be normal lists without any grouping:而且我还实现了我想要假设pred
和gt
将是没有任何分组的普通列表的指标:
unordered_matches = len(set(pred) & set(gt)) / len(set(pred) | set(gt))
How can I achieve this now with the grouping in pandas (and converting to dict ideally)?我现在如何通过 pandas 中的分组来实现这一点(理想情况下转换为 dict )?
Just for better understanding, here how the results would be for the sample data above:为了更好地理解,以下示例数据的结果如何:
agent 0:代理 0:
agent 1:代理 1:
I would be interested in a problem specific pandas solution as well as a more generic solution how I can translate my python formulas so they work with panda grouping.我会对特定问题的 pandas 解决方案以及如何翻译我的 python 公式以便它们与熊猫分组一起使用的更通用的解决方案感兴趣。
Using set operations and groupby.apply
:使用集合操作和groupby.apply
:
(df.groupby('agent_id')
.apply(lambda x: len((S1:=set(x['pred'])) & (S2:=set(x['gt'])))/len(S1|S2))
)
output: output:
agent_id
0 1.0
1 0.5
dtype: float64
As dictionary, add .to_dict()
: {0: 1.0, 1: 0.5}
作为字典,添加.to_dict()
: {0: 1.0, 1: 0.5}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.