[英]Pandas: Group by combination of two columns
I have data as follows. 我的数据如下。 The score column is the score of x vs y (which is equivalent to y vs x).
得分列是x对y的得分(相当于y对x)。
from collections import Counter
import pandas as pd
d = pd.DataFrame([('a','b',1), ('a','c', 2), ('b','a',3), ('b','a',3)],
columns=['x', 'y', 'score'])
x y score
0 a b 1
1 a c 2
2 b a 3
3 b a 3
I want to evaluate the count of the score of each combination, so ('a' vs 'b) and ('b' vs 'a') should be grouped together, ie 我想评估每个组合的得分计数,因此('a'vs'b)和('b'vs'a')应该组合在一起,即
score
x y
a b {1: 1, 3: 2}
c {2: 1}
However if I do d.groupby(['x', 'y']).agg(Counter)
, ('a', 'b') and ('b', 'a') are not combined together. 但是,如果我执行
d.groupby(['x', 'y']).agg(Counter)
,('a','b')和('b','a')不会组合在一起。 Is there a way to solve this? 有办法解决这个问题吗? Thanks!
谢谢!
score
x y
a b {1: 1}
c {2: 1}
b a {3: 2}
If you do not care about order then, may be you can use sort
on two columns then, apply, groupby
: 如果你不关心订单那么,可能你可以在两列上使用
sort
,apply, groupby
:
import pandas as pd
from collections import Counter
d = pd.DataFrame([('a','b',1), ('a','c', 2), ('b','a',3), ('b','a',3)],
columns=['x', 'y', 'score'])
# Note: you can copy to other dataframe if you do not want to change original
d[['x', 'y']] = d[['x', 'y']].apply(sorted, axis=1)
x = d.groupby(['x', 'y']).agg(Counter)
print(x)
# Result:
# score
# x y
# a b {1: 1, 3: 2}
# c {2: 1}
You can also groupby
using the aggregated frozenset
of x
and y
and then agg
using Counter
您也可以
groupby
使用聚合frozenset
的x
和y
,然后agg
使用Counter
from collections import Counter
df.groupby(df[['x', 'y']].agg(frozenset, 1)).score.agg(Counter)
(b, a) {1: 1, 3: 2}
(a, c) {2: 1}
If you want a dataframe
, 如果你想要一个
dataframe
,
.to_frame()
score
(b, a) {1: 1, 3: 2}
(a, c) {2: 1}
IIUC IIUC
d[['x','y']]=np.sort(d[['x','y']],1)
pd.crosstab([d.x,d.y],d.score)
Out[94]:
score 1 2 3
x y
a b 1 0 2
c 0 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.