简体   繁体   中英

how to check column value of one data frame how many times in another dataframe column pandas?

I'm having two dataframes like below-

dataframe 1-

df1_data = {'sym' :{0:'AAA',1:'BBB',2:'CCC',3:'DDD',4:'EEE',5:'FFF'}}
df = pd.DataFrame(df1_data)

dataframe 2-

df2_data = {'sym1' :{0:'AAA',1:'BB',2:'XXX',3:'A',4:'CCC',5:'D',6:'EEE',7:'EEE',8:'FFF'}}
df2 = pd.DataFrame(df2_data)

I want to check sym column available in dataframe 1 is how many times in sym1 column of dataframe 2?

Expected Result -

   sym,count
0  AAA,1
1  BBB,0
2  CCC,1
3  DDD,0
4  EEE,2
5  FFF,1

Here's one way by join ing on df with sym index on df2 's value_counts() , then fill NaN , and for view's sake, use reset_index()

In [113]: df.set_index('sym').join(df2.sym1.value_counts()).fillna(0).reset_index()
Out[113]:
   sym  sym1
0  AAA   1.0
1  BBB   0.0
2  CCC   1.0
3  DDD   0.0
4  EEE   2.0
5  FFF   1.0

Details

In [114]: df2.sym1.value_counts()
Out[114]:
EEE    2
CCC    1
XXX    1
FFF    1
BB     1
D      1
AAA    1
A      1
Name: sym1, dtype: int64

Or , using map

In [141]: df['count'] = df['sym'].map(df2.sym1.value_counts()).fillna(0)

In [142]: df
Out[142]:
   sym  count
0  AAA    1.0
1  BBB    0.0
2  CCC    1.0
3  DDD    0.0
4  EEE    2.0
5  FFF    1.0

numpy broadcasting

df.assign(count=(df.sym.values[:, None] == df2.sym1.values).sum(1))

   sym  count
0  AAA      1
1  BBB      0
2  CCC      1
3  DDD      0
4  EEE      2
5  FFF      1

Solution with value_counts and reindex , values in df['sym'] has to be unique.

print (df2.sym1.value_counts().reindex(df['sym'], fill_value=0).reset_index())
   sym  sym1
0  AAA     1
1  BBB     0
2  CCC     1
3  DDD     0
4  EEE     2
5  FFF     1

one simple way would be to:

df['count'] = df['sym'].map(lambda x:list(df2['sym1'].values).count(x))

hope it helps

Here is a solution where you need to iterate df and df2 only once:

from collections import defaultdict
vals_dict = defaultdict(list, df2.groupby('sym1').indices)
df['count'] = df['sym'].apply(lambda x: len(vals_dict[x]))
print df

It will add to df a new column with the requested values:

   sym  count
0  AAA      1
1  BBB      0
2  CCC      1
3  DDD      0
4  EEE      2
5  FFF      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM