简体   繁体   中英

Perform a function on columns in pandas dataframe with the same name

I have a dataframe which contains 111 columns in which some of them have the same column names. The total unique column names are 27.

>>> has_2.head(6)
    Has_MCS_A      Has_MCS_A     Has_MCS_A      Has_MCS_A  \
           0              0              0              3   
           0              1              0              0   
           0              0              0              0   
           1              0              0              0   
           0              0              10             0   
           0              0              0              0   

    Has_MCS_B     Has_MCS_B         Has_MCS_B        Has_MCS_B  \
          0                0                0                6   
          0                0                0                0   
          0                9                0                0   
          10               0                0                0   
          0                0                0                0   
          0                0                7                0   

I want to add the values in these columns with the same column name. So finally the result should be a dataframe with only 27 columns

You can construct a new df and iterate over the unique column values and then assign for each column the sum row-wise:

In [21]:
import io
import pandas as pd
t="""Has_MCS_A      Has_MCS_A     Has_MCS_A      Has_MCS_A 
        0              0              0              3   
           0              1              0              0   
           0              0              0              0   
           1              0              0              0   
           0              0              10             0   
           0              0              0              0   """
df = pd.read_csv(io.StringIO(t), sep='\s+')
df

Out[21]:
   Has_MCS_A  Has_MCS_A.1  Has_MCS_A.2  Has_MCS_A.3
0          0            0            0            3
1          0            1            0            0
2          0            0            0            0
3          1            0            0            0
4          0            0           10            0
5          0            0            0            0

In [22]:    
# overwrite the columns to force duplicate names
df.columns = ['Has_MCS_A','Has_MCS_A','Has_MCS_A','Has_MCS_A']
df

Out[22]:
   Has_MCS_A  Has_MCS_A  Has_MCS_A  Has_MCS_A
0          0          0          0          3
1          0          1          0          0
2          0          0          0          0
3          1          0          0          0
4          0          0         10          0
5          0          0          0          0
In [23]:
# construct a new df
new_df = pd.DataFrame()
for col in df.columns.unique():
    new_df[col] = df[col].sum(axis=1)
new_df

Out[23]:
   Has_MCS_A
0          3
1          1
2          0
3          1
4         10
5          0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM