简体   繁体   中英

add selected columns from two pandas dfs

I have two pandas dataframes a_df and b_df. a_df has columns ID, atext, and var1-var25, while b_df has columns ID, atext, and var1-var 25.

I want to add ONLY the corresponding vars from a_df and b_df and leave ID, and atext alone.

The code below adds ALL the corresponding columns. Is there a way to get it to add just the columns of interest?

absum_df=a_df.add(b_df)

What could I do to achieve this?

Use filter :

absum_df = a_df.filter(like='var').add(b_df.filter(like='var'))

If you want to keep additional columns as-is, use concat after summing:

absum_df = pd.concat([a_df[['ID', 'atext']], absum_df], axis=1)

Alternatively, instead of subselecting columns from a_df , you could instead just drop the columns in absum_df , if you want to add all columns from a_df not in absum_df :

absum_df = pd.concat([a_df.drop(absum_df.columns axis=1), absum_df], axis=1)

You can subset a dataframe to particular columns:

var_columns = ['var-{}'.format(i) for i in range(1,26)]
absum_df=a_df[var_columns].add(b_df[var_columns])

Note that this will result in a dataframe with only the var columns. If you want a dataframe with the non-var columns from a_df, and the var columns being the sum of a_df and b_df, you can do

absum_df = a_df.copy()
absum_df[var_columns] = a_df[var_columns].add(b_df[var_columns])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM