简体   繁体   中英

Pandas: Pythonic way to create new df by summing columns from different dfs

Suppose we have two dfs

df1

  a b
z 3 4
x 1 3

and

df2

  a b 
c 4 8
v 6 1

I want to create df3 which has two new rows [b,n] and its values are based on summing the columns [a,b] from my two dfs like this :

df3

  a  b
b 4  7
n 10 9

I know this can be done simply by using .sum() on both dataframes and simply creating df3 manually, like this:

df3 = pd.DataFrame([[4,7],[10,9]], columns = ['a','b'])

I just wanted to know if there is a more pythonic way of doing this that uses a single function or iteration to generate it in less time.

Use concat with sum by first level:

df3 = pd.concat([df1, df2], keys=['b', 'n']).sum(level=0)
print (df3)
    a  b
b   4  7
n  10  9

Solution for multiple DataFrames :

dfs = [df1, df2, df3, ...]
df = pd.concat(dfs, keys=range(len(dfs))).sum(level=0)

EDIT:

If want sum only some columns:

 cols = set(df1.columns).intersection(df2.columns)
df3 = pd.concat([df1[['a','b']], df2[['a','b']]], keys=['b', 'n']).sum(level=0)

And for all columns in both DataFrames:

cols = list(set(df1.columns).intersection(df2.columns))
df3 = pd.concat([df1[cols], df2[cols]], keys=['b', 'n']).sum(level=0)
print (df3)
    a  b
b   4  7
n  10  9

You can do this with sum and concat .

pd.concat([df1.sum(), df2.sum()], 1).T

    a  b
0   4  7
1  10  9

Or, with a keys argument -

pd.concat([df1.sum(), df2.sum()], 1, keys=['b', 'n']).T
    a  b
b   4  7
n  10  9

If you have many such dataframes, and assuming they have the same columns, you can put them into a single list and call sum inside a list comprehension -

df_list = [df1, df2, ...]
pd.concat([df.sum() for df in df_list], 1, keys=['b', 'n']).T

    a  b
b   4  7
n  10  9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM