简体   繁体   English

第一列数据帧中每两个元素的公共元素

[英]Elements in common for each two elements in first column dataframes

I'm new with Pandas. 我是熊猫的新手。 I have the following dataframe. 我有以下数据框。

 Group type G1 a1 G1 a2 G1 a3 G2 a2 G2 a1 G3 a1 G4 a1 G5 a4 G5 a1 

And I would like to obtain for each couple of groups how many "types" they have in common. 我想为每对夫妇获取他们共有多少个“类型”。 Something like this: 像这样:

 Group type count G1 a1 G1 a2 G1 a3 G2 a2 G2 a1 G3 a1 G4 a1 G5 a4 G5 a1 count: (G1, G2, 2) (Elements in common: a1,a2) count: (G1, G3, 1) (Elements in common: a1) count: (G1, G4, 1) (Elements in common: a1) ... 

Do you have any idea how could I implement this? 你有什么主意我该怎么实现吗? Is there any function from the pandas library that could guide me into the right direction. 熊猫库中是否有任何功能可以引导我朝正确的方向发展。

I think you need numpy.intersect1d : 我认为你需要numpy.intersect1d

import itertools

#get all combinations of Group values
c = list(itertools.combinations(list(set(df['Group'])), 2))

df = df.set_index('Group')

#create list of tuples of intersections and lengths 
L = []
for a, b in c:
    d = np.intersect1d(df.loc[a], df.loc[b]).tolist()
    L.append((a,b, len(d), d))

#new DataFrame
df = pd.DataFrame(L, columns=['a','b','lens','common'])
print (df)
    a   b  lens    common
0  G2  G4     1      [a1]
1  G2  G1     2  [a1, a2]
2  G2  G3     1      [a1]
3  G2  G5     1      [a1]
4  G4  G1     1      [a1]
5  G4  G3     1      [a1]
6  G4  G5     1      [a1]
7  G1  G3     1      [a1]
8  G1  G5     1      [a1]
9  G3  G5     1      [a1]

Given a dataframe: 给定一个数据框:

import pandas as pd
df = pd.DataFrame([['G1', 'G1', 'G2', 'G2'], ['a1', 'a2', 'a1', 'a3']]).T
df.columns = ['group', 'type']

Then there are two options: 然后有两个选择:

df.groupby('type').count()

or if you want to know them explicitly: 或者如果您想明确地了解它们:

df.groupby(['type', 'group']).count()

Thus you can do, eg: 因此,您可以这样做,例如:

df1.loc['a1']

with output: 输出:

group
G1
G2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM