简体   繁体   中英

Calculate sum of columns of same name pandas

How can i search for duplicate columns in a dataframe and then create a new column with same name. the new column is result of 'OR' operator of these columns. Then drop old duplicated columns.

Example:

For that, I tried to create a unique column 'job' that is the result of 'OR' operator of the two 'job' columns in the table bellow.

There is my table look like:

name job maried children job
John True True True True
Peter True False True True
Karl False True True True
jack False False False False

the result that I want is:

name job maried children
John True True True
Peter True False True
Karl True True True
jack False False False

I tried to do this (df1 is my table):

df_join = pd.DataFrame()
df1_dulp = pd.DataFrame()
df_tmp = pd.DataFrame()

for column in df1.columns:
    df1_dulp = df1.filter(like=str(column))
    if df1_dulp.shape[1] >= 2:
        for i in range(0, df1_dulp.shape[1]):
            df_tmp += df1_dulp.iloc[:,i]
        if column in df1_dulp.columns:
            df1_dulp.drop(column, axis=1, inplace=True)
        df_join = df_join.join(df1_dulp, how = 'left', lsuffix='left', rsuffix='right')

The result is an empty table (df_join).

You can select the boolean columns with select_dtypes , then aggregate as OR with groupby.any on columns:

out = (df
   .select_dtypes(exclude='bool')
   .join(df.select_dtypes('bool')
           .groupby(level=0, axis=1, sort=False).any()
        )
)

output:

    name    job  maried  children
0   John   True    True      True
1  Peter   True   False      True
2   Karl   True    True      True
3   jack  False   False     False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM