Question 1:
I have a data frame with two month value columns as month1
and month2
. If the value in month1
column is not NA
, then sum the corresponding amount
values as per month1
column. If the value in month1
column is NA
, then pick the corresponding value of 'month2' and search for it in month1
column and perform sum.
import pandas as pd
df = pd.DataFrame(
{
'month1': [1, 2, 'NA', 1, 4, 'NA', 'NA'],
'month2': ['NA', 5, 1, 2, 'NA', 1, 3],
'amount': [10, 20, 40, 50, 60, 70, 100]
}
)
Desired output for question 1:
month1 month2 sum_amount
0 1.0 NaN 60
1 2.0 5.0 20
2 NaN 1.0 60
3 1.0 2.0 60
4 4.0 NaN 60
5 NaN 1.0 60
6 NaN 3.0 0
Question 2:
I have a data frame with two month value columns as month1
and month2
. If the value in month1
column is not NA
, then sum the corresponding amount
values as per month2
column. If the value in month1
column is NA
, then pick the corresponding value of month2
and search for it in month2
column and perform sum.
import pandas as pd
df = pd.DataFrame(
{
'month1': [1, 2, 'NA', 1, 4, 'NA', 'NA'],
'month2': ['NA', 5, 1, 2, 'NA', 1, 3],
'amount': [10, 20, 40, 50, 60, 70, 100]
}
)
Desired Output for question 2:
month1 month2 sum_amount
0 1.0 NaN 110
1 2.0 5.0 50
2 NaN 1.0 110
3 1.0 2.0 110
4 4.0 NaN 0
5 NaN 1.0 110
6 NaN 3.0 100
My solution is not the elegant one, but it works. Have a look.
The same part for both of your questions would be:
In [1]: import pandas as pd
df = pd.DataFrame(
{
'month1': [1, 2, 'NA', 1, 4, 'NA', 'NA'],
'month2': ['NA', 5, 1, 2, 'NA', 1, 3],
'amount': [10, 20, 40, 50, 60, 70, 100],
}
)
def make_sum_amount(row, amount_sum):
if row['month1'] == 'NA':
if row['month2'] == 'NA':
return 0
return amount_sum.get(row['month2'], 0)
return amount_sum.get(row['month1'], 0)
Solution for the first question:
In [2]: grouped_df = df[df['month1']!='NA'].groupby('month1').sum().reset_index()
amount_sum = {k: v for k, v in zip(grouped_df['month1'], grouped_df['amount'])}
df['sum_amount'] = df.apply(lambda row: make_sum_amount(row, amount_sum), axis=1)
df
Out [2]: month1 month2 amount sum_amount
0 1.0 NA 10 60
1 2.0 5.0 20 20
2 NA 1.0 40 60
3 1.0 2.0 50 60
4 4.0 NA 60 60
5 NA 1.0 70 60
6 NA 3.0 100 0
Solution for the second question:
In [3]: grouped_df = df[df['month2']!='NA'].groupby('month2').sum().reset_index()
amount_sum = {k: v for k, v in zip(grouped_df['month2'], grouped_df['amount'])}
df['sum_amount'] = df.apply(lambda row: make_sum_amount(row, amount_sum), axis=1)
df
Out [3]: month1 month2 amount sum_amount
0 1.0 NA 10 110
1 2.0 5.0 20 50
2 NA 1.0 40 110
3 1.0 2.0 50 110
4 4.0 NA 60 0
5 NA 1.0 70 110
6 NA 3.0 100 100
First replace strings NA
to missing values, then aggregate sum
to Series
. Then use Series.map
for new column with missing values for non matched values, so replace NaN
s with Series.fillna
and another column mapped, last replace non matched values in both columns to 0
:
df = df.replace('NA', np.nan)
s = df.groupby('month1')['amount'].sum()
df['sum_amount'] = df['month1'].map(s).fillna(df['month2'].map(s)).fillna(0).astype(int)
print (df)
month1 month2 amount sum_amount
0 1.0 NaN 10 60
1 2.0 5.0 20 20
2 NaN 1.0 40 60
3 1.0 2.0 50 60
4 4.0 NaN 60 60
5 NaN 1.0 70 60
6 NaN 3.0 100 0
And for second only change columns names in same solution:
df = df.replace('NA', np.nan)
s = df.groupby('month2')['amount'].sum()
df['sum_amount'] = df['month1'].map(s).fillna(df['month2'].map(s)).fillna(0).astype(int)
print (df)
month1 month2 amount sum_amount
0 1.0 NaN 10 110
1 2.0 5.0 20 50
2 NaN 1.0 40 110
3 1.0 2.0 50 110
4 4.0 NaN 60 0
5 NaN 1.0 70 110
6 NaN 3.0 100 100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.