简体   繁体   中英

Pandas: How create columns where sum other columns based on conditional of other column values?

I have the following pandas DataFrame.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

 code1  code2 code3 code4 value1 value2 value3 value4 
0 101   101   101   101   1000    1000  1000   1000    
1 101   101   101   201   1000    1000  1000   1000    
2 101   101   201   201   1000    1000  1000   1000    
3 101   201   201   201   1000    1000  1000   1000    
4 101   201   201   301   1000    1000  1000   1000    
5 101   201   301   301   1000    1000  1000   1000    
6 101   301   301   301   1000    1000  1000   1000    
7 101   101   101   301   1000    1000  1000   1000    
8 101   201   301   0     1000    1000  1000   0       
9 101   301   0     0     1000    1000  0      0       


....

I need to create one column to sum the column value(value1, value2, value3, value4) considering the columns code (code1, code2, code3, code4) as follows:

  code1 code2 code3 code4 value1 value2 value3 value4 sum_code_101 sum_code_201 sum_code_301
0 101   101   101   101   1000    1000  1000   1000     4000           0           0
1 101   101   101   201   1000    1000  1000   1000     3000           1000        0
2 101   101   201   201   1000    1000  1000   1000     2000           2000        0
3 101   201   201   201   1000    1000  1000   1000     1000           3000        0
4 101   201   201   301   1000    1000  1000   1000     1000           2000        1000
5 101   201   301   301   1000    1000  1000   1000     1000           1000        2000
6 101   301   301   301   1000    1000  1000   1000     1000           0           3000
7 101   101   101   301   1000    1000  1000   1000     3000           0           1000
8 101   201   301   0     1000    1000  1000   0        1000           1000        1000
9 101   301   0     0     1000    1000  0      0        1000           0           1000  

I have tried:

df['sum_code_101']=df[df['code1']=='101'],['value1']+df[df['code2']=='101'],['value2']+df[df['code3']=='101'],['value3']+df[df['code4']=='101'],['value4']
df['sum_code_201']=df[df['code1']=='201'],['value1']+df[df['code2']=='201'],['value2']+df[df['code3']=='201'],['value3']+df[df['code4']=='201'],['value4']
df['sum_code_301']=df[df['code1']=='301'],['value1']+df[df['code2']=='301'],['value2']+df[df['code3']=='301'],['value3']+df[df['code4']=='301'],['value4']

However, I got this error message:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

As the real dataframe has 25 differents codes (101, 201, 301..) I need to create a 25 columns to sum their values.

Any Help from You Guys will be very Appreciated, Thank You . . .

You can use a combination of pd.wide_to_long and groupby with some dataframe reshaping.

df = df.reset_index()
df_long = pd.wide_to_long(df, ['code','value'], 'index', 'No')
df_long.groupby(['index','code']).sum().unstack(fill_value=0)
df_sum = df_long.groupby(['index','code']).sum().replace(0, np.nan).dropna(axis=0)['value'].unstack(fill_value=0)
df_sum.columns = [f'sum_{df_sum.columns.name}_{i}' for i in df_sum.columns]
df_out = df.set_index('index').join(df_sum)
df_out

Output:

       code1  code2  code3  code4  value1  value2  value3  value4  sum_code_101  sum_code_201  sum_code_301
index                                                                                                      
0        101    101    101    101    1000    1000    1000    1000        4000.0           0.0           0.0
1        101    101    101    201    1000    1000    1000    1000        3000.0        1000.0           0.0
2        101    101    201    201    1000    1000    1000    1000        2000.0        2000.0           0.0
3        101    201    201    201    1000    1000    1000    1000        1000.0        3000.0           0.0
4        101    201    201    301    1000    1000    1000    1000        1000.0        2000.0        1000.0
5        101    201    301    301    1000    1000    1000    1000        1000.0        1000.0        2000.0
6        101    301    301    301    1000    1000    1000    1000        1000.0           0.0        3000.0
7        101    101    101    301    1000    1000    1000    1000        3000.0           0.0        1000.0
8        101    201    301      0    1000    1000    1000       0        1000.0        1000.0        1000.0
9        101    301      0      0    1000    1000       0       0        1000.0           0.0        1000.0

Here is a solution using the pandas apply method. Generally not ideal when you can use column/row operations. But this works.

import pandas as pd
data = {
    'code1': ['101', '101', '101', '101', '101', '101'],
    'code2': ['101', '101', '101', '201', '201', '201'],
    'code3': ['101', '101', '101', '201', '201', '301'],
    'code4': ['101', '201', '201', '201', '301', '301'],
    'value1': [1000, 1000, 1000, 1000, 1000, 1000],
    'value2': [1000, 1000, 1000, 1000, 1000, 1000],
    'value3': [1000, 1000, 1000, 1000, 1000, 1000],
    'value4': [1000, 1000, 1000, 1000, 1000, 1000]
}
df = pd.DataFrame(data)

def apply_to_row(row, value):
    code_cols = ['code1', 'code2', 'code3', 'code4']
    value_cols = ['value1', 'value2', 'value3', 'value4']

    code_value_sum = 0
    for code_col, value_col in zip(code_cols, value_cols):
        if row[code_col] == value:
            code_value_sum += row[value_col]

    return code_value_sum

code_values = ['101', '201', '301'] # probably replace with a distinct value list of code columns
for code_value in code_values:
    df['sum_code_' + str(code_value)] = df.apply(apply_to_row, value=code_value, axis=1)

here is the result:

  code1 code2 code3 code4  value1  value2  value3  value4   sum_code_101  sum_code_201  sum_code_301  
0   101   101   101   101    1000    1000    1000    1000          4000     0                0  
1   101   101   101   201    1000    1000    1000    1000          3000     1000             0  
2   101   101   101   201    1000    1000    1000    1000          3000     1000             0  
3   101   201   201   201    1000    1000    1000    1000          1000     3000             0  
4   101   201   201   301    1000    1000    1000    1000          1000     2000             1000  
5   101   201   301   301    1000    1000    1000    1000          1000     1000             2000

Thanks!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM