[英]Number of common elements by index in two dataframes
I have the three following dataframes: 我有以下三个数据帧:
df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3],
'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})
df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5],
'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})
df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3],
'id_B': [2, 3, 4, 5]})
df_A
id_A Animal_A
0 1 cat
1 1 dog
2 1 fish
3 1 bird
4 2 cat
5 2 fish
6 3 bird
7 3 cat
df_B
id_B Animal_B
0 1 dog
1 2 cat
2 2 fish
3 3 dog
4 4 fish
5 4 cat
6 5 cat
df_P
id_A id_B
0 1 2
1 1 3
2 2 4
3 3 5
And I would like to get an additional column to df_P that tells the number of Animals shared between id_A and id_B. 我想获得一个额外的列到df_P,它告诉id_A和id_B之间共享的动物数量。 What I'm doing is:
我在做的是:
df_P["n_common"] = np.nan
for i in df_P.index.tolist():
id_A = df_P["id_A"][i]
id_B = df_P["id_B"][i]
df_P.iloc[i,df_P.columns.get_loc('n_common')] = len(set(df_A['Animal_A'][df_A['id_A']==id_A]).intersection(df_B['Animal_B'][df_B['id_B']==id_B]))
The result being: 结果是:
df_P
id_A id_B n_common
0 1 2 2.0
1 1 3 1.0
2 2 4 2.0
3 3 5 1.0
Is there a faster, more pythonic, way to do this? 是否有更快,更pythonic的方式来做到这一点? Is there a way to avoid the for loop?
有没有办法避免for循环?
Not sure if it is faster or more pythonic, but it avoids the for loop :) 不确定它是否更快或更pythonic,但它避免了for循环:)
import pandas as pd
df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3],
'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})
df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5],
'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})
df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3],
'id_B': [2, 3, 4, 5]})
df = pd.merge(df_A, df_P, on='id_A')
df = pd.merge(df_B, df, on='id_B')
df = df[df['Animal_A'] == df['Animal_B']].groupby(['id_A', 'id_B'])['Animal_A'].count().reset_index()
df.rename({'Animal_A': 'n_common'},inplace=True,axis=1)
您可以尝试以下方法:
df_A.merge(df_B, left_on = ['Animal_A'], right_on = ['Animal_B'] ).groupby(['id_A' ,'id_B']).count().reset_index().merge(df_P).drop('Animal_B', axis = 1).rename(columns = {'Animal_A': 'count'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.