[英]Elements in common for each two elements in first column dataframes
我是熊猫的新手。 我有以下数据框。
Group type G1 a1 G1 a2 G1 a3 G2 a2 G2 a1 G3 a1 G4 a1 G5 a4 G5 a1
我想为每对夫妇获取他们共有多少个“类型”。 像这样:
Group type count G1 a1 G1 a2 G1 a3 G2 a2 G2 a1 G3 a1 G4 a1 G5 a4 G5 a1 count: (G1, G2, 2) (Elements in common: a1,a2) count: (G1, G3, 1) (Elements in common: a1) count: (G1, G4, 1) (Elements in common: a1) ...
你有什么主意我该怎么实现吗? 熊猫库中是否有任何功能可以引导我朝正确的方向发展。
我认为你需要numpy.intersect1d
:
import itertools
#get all combinations of Group values
c = list(itertools.combinations(list(set(df['Group'])), 2))
df = df.set_index('Group')
#create list of tuples of intersections and lengths
L = []
for a, b in c:
d = np.intersect1d(df.loc[a], df.loc[b]).tolist()
L.append((a,b, len(d), d))
#new DataFrame
df = pd.DataFrame(L, columns=['a','b','lens','common'])
print (df)
a b lens common
0 G2 G4 1 [a1]
1 G2 G1 2 [a1, a2]
2 G2 G3 1 [a1]
3 G2 G5 1 [a1]
4 G4 G1 1 [a1]
5 G4 G3 1 [a1]
6 G4 G5 1 [a1]
7 G1 G3 1 [a1]
8 G1 G5 1 [a1]
9 G3 G5 1 [a1]
给定一个数据框:
import pandas as pd
df = pd.DataFrame([['G1', 'G1', 'G2', 'G2'], ['a1', 'a2', 'a1', 'a3']]).T
df.columns = ['group', 'type']
然后有两个选择:
df.groupby('type').count()
或者如果您想明确地了解它们:
df.groupby(['type', 'group']).count()
因此,您可以这样做,例如:
df1.loc['a1']
输出:
group
G1
G2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.