简体   繁体   中英

How do I create a new pandas dataframe after comparing 2 data frame columns based on a condition?

Data frame 1

cust_id1   cust_id2  
23           50     
23           51     
23           52      
24           32     
25           40  
25           41

Dataframe 2

cust_id    
50
51
52
32
40

FOR EACH cust_id1, check if cust_id2 of Dataframe 1 is present in cust_id of Dataframe 2. IF YES, THEN create a new data frame. Also, I further would like to add a new column count for cust_id2

Expected Output in a new data frame 3

cust_id1   cust_id2  count_cust_id2
23           50           1
23           51           1
23           52           1
24           32           1
25           40           1

This is what I have tried

uniq = df1['cust_id1'].dropna().unique()

cust_id_grouped = df1.groupby(['cust_id1'], as_index = False)

for i in uniq:

    all_groups = cust_id_grouped.get_group(i)

    count_cust_id2 = 0
    for index, row in all_groups.iterrows():            
        if (row.cust_id1.isin(df2. cust_id)):
            count_cust_id2 = count_cust_id2 + 1

Part 2 In addition to the above data frames, I have similar data frames.

Dataframe 4

cust_id1   cust_id2  
23           50     
23           51     
23           52      
24           32     
25           40  
25           41
26           50     

Dataframe 5

cust_id    
50
51
52
32
40
41
50

The Expected Output should update data frame 3,

cust_id1   cust_id2  count_cust_id2
23           50           2
23           51           2
23           52           2
24           32           2
25           40           2
25           41           1
26           50           1

I have multiple of these data frames and each of these groups(df1-df2) and (df4-df5) represents a single month. The count is actually for the month.

Use Series.isin for test all values, testing per groups is not necessary. Last get counts by GroupBy.transform and GroupBy.size :

df3 = df1[df1.cust_id2.isin(df2.cust_id)].copy()
df3['count_cust_id2'] = df3.groupby('cust_id2')['cust_id2'].transform('size')
print (df3)
   cust_id1  cust_id2  count_cust_id2
0        23        50               1
1        23        51               1
2        23        52               1
3        24        32               1
4        25        40               1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM