简体   繁体   中英

Apply multiple conditional level groupby

Question 1:

I have a data frame with two month value columns as month1 and month2 . If the value in month1 column is not NA , then sum the corresponding amount values as per month1 column. If the value in month1 column is NA , then pick the corresponding value of 'month2' and search for it in month1 column and perform sum.

import pandas as pd
df = pd.DataFrame(
    {
        'month1': [1, 2, 'NA', 1, 4, 'NA', 'NA'],
        'month2': ['NA', 5, 1, 2, 'NA', 1, 3],
        'amount': [10, 20, 40, 50, 60, 70, 100]
    }
)

Desired output for question 1:

    month1  month2  sum_amount
0     1.0     NaN      60
1     2.0     5.0      20
2     NaN     1.0      60
3     1.0     2.0      60
4     4.0     NaN      60
5     NaN     1.0      60
6     NaN     3.0      0

Question 2:

I have a data frame with two month value columns as month1 and month2 . If the value in month1 column is not NA , then sum the corresponding amount values as per month2 column. If the value in month1 column is NA , then pick the corresponding value of month2 and search for it in month2 column and perform sum.

import pandas as pd
df = pd.DataFrame(
    {
        'month1': [1, 2, 'NA', 1, 4, 'NA', 'NA'],
        'month2': ['NA', 5, 1, 2, 'NA', 1, 3],
        'amount': [10, 20, 40, 50, 60, 70, 100]
    }
)

Desired Output for question 2:

    month1  month2  sum_amount
0     1.0     NaN      110
1     2.0     5.0      50
2     NaN     1.0      110
3     1.0     2.0      110
4     4.0     NaN      0
5     NaN     1.0      110
6     NaN     3.0      100

My solution is not the elegant one, but it works. Have a look.

The same part for both of your questions would be:

In  [1]: import pandas as pd    
         df = pd.DataFrame(
             {
                 'month1': [1, 2, 'NA', 1, 4, 'NA', 'NA'],
                 'month2': ['NA', 5, 1,  2, 'NA', 1, 3],
                 'amount': [10, 20, 40, 50, 60, 70, 100],
             }
         )

         def make_sum_amount(row, amount_sum):
             if row['month1'] == 'NA':
                 if row['month2'] == 'NA':
                     return 0
                 return amount_sum.get(row['month2'], 0)
             return amount_sum.get(row['month1'], 0)

Solution for the first question:

In  [2]: grouped_df = df[df['month1']!='NA'].groupby('month1').sum().reset_index()
         amount_sum = {k: v for k, v in zip(grouped_df['month1'], grouped_df['amount'])}
         df['sum_amount'] = df.apply(lambda row: make_sum_amount(row, amount_sum), axis=1)
         df

Out [2]:    month1  month2  amount  sum_amount
         0     1.0      NA      10          60
         1     2.0     5.0      20          20
         2      NA     1.0      40          60
         3     1.0     2.0      50          60
         4     4.0      NA      60          60
         5      NA     1.0      70          60
         6      NA     3.0     100           0

Solution for the second question:

In  [3]: grouped_df = df[df['month2']!='NA'].groupby('month2').sum().reset_index()
         amount_sum = {k: v for k, v in zip(grouped_df['month2'], grouped_df['amount'])}
         df['sum_amount'] = df.apply(lambda row: make_sum_amount(row, amount_sum), axis=1)
         df

Out [3]:    month1  month2  amount  sum_amount
         0     1.0      NA      10         110
         1     2.0     5.0      20          50
         2      NA     1.0      40         110
         3     1.0     2.0      50         110
         4     4.0      NA      60           0
         5      NA     1.0      70         110
         6      NA     3.0     100         100

First replace strings NA to missing values, then aggregate sum to Series . Then use Series.map for new column with missing values for non matched values, so replace NaN s with Series.fillna and another column mapped, last replace non matched values in both columns to 0 :

df = df.replace('NA', np.nan)
s = df.groupby('month1')['amount'].sum()
df['sum_amount'] = df['month1'].map(s).fillna(df['month2'].map(s)).fillna(0).astype(int)
print (df)
   month1  month2  amount  sum_amount
0     1.0     NaN      10          60
1     2.0     5.0      20          20
2     NaN     1.0      40          60
3     1.0     2.0      50          60
4     4.0     NaN      60          60
5     NaN     1.0      70          60
6     NaN     3.0     100           0

And for second only change columns names in same solution:

df = df.replace('NA', np.nan)
s = df.groupby('month2')['amount'].sum()
df['sum_amount'] = df['month1'].map(s).fillna(df['month2'].map(s)).fillna(0).astype(int)
print (df)
   month1  month2  amount  sum_amount
0     1.0     NaN      10         110
1     2.0     5.0      20          50
2     NaN     1.0      40         110
3     1.0     2.0      50         110
4     4.0     NaN      60           0
5     NaN     1.0      70         110
6     NaN     3.0     100         100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM