I have datasets similar to this:
df1
company | date | act_call | act_visit | po |
---|---|---|---|---|
A | 2022-10-01 | Yes | No | No |
B | 2022-10-01 | Yes | No | Yes |
C | 2022-10-01 | No | No | No |
B | 2022-10-02 | No | Yes | No |
A | 2022-10-02 | No | Yes | No |
df2
company | date | act_call | act_visit | po |
---|---|---|---|---|
D | 2022-11-01 | Yes | No | No |
B | 2022-11-01 | Yes | No | Yes |
C | 2022-11-01 | Yes | Yes | No |
D | 2022-11-02 | No | Yes | No |
A | 2022-11-02 | No | Yes | Yes |
I want to count the number of company where the po
is 'No' in df1
but also exists in df2
.
I tried using this code:
int_df = len(set(df2['company']).intersection(df1['po'].eq('no').groupby(df1['company'])))
but it returns below error:
unhashable type: 'Series'
My expected output:
2, (A, C)
*notes: the (A, C) doesn't have to be printed since I actually only want the number of the company.
What would be the best code to my expected output? Thank u in advance!
I would filter first the companies based on df2
with isin
, then aggregate with groupy.all
to identify the company with only "No", and sum
:
(df1.loc[df1['company'].isin(df2['company']), 'po']
.eq('No')
.groupby(df1['company']).all()
.sum()
)
Output: 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.