Suppose we have two dfs
df1
a b
z 3 4
x 1 3
and
df2
a b
c 4 8
v 6 1
I want to create df3 which has two new rows [b,n] and its values are based on summing the columns [a,b] from my two dfs like this :
df3
a b
b 4 7
n 10 9
I know this can be done simply by using .sum() on both dataframes and simply creating df3 manually, like this:
df3 = pd.DataFrame([[4,7],[10,9]], columns = ['a','b'])
I just wanted to know if there is a more pythonic way of doing this that uses a single function or iteration to generate it in less time.
Use concat
with sum
by first level:
df3 = pd.concat([df1, df2], keys=['b', 'n']).sum(level=0)
print (df3)
a b
b 4 7
n 10 9
Solution for multiple DataFrames
:
dfs = [df1, df2, df3, ...]
df = pd.concat(dfs, keys=range(len(dfs))).sum(level=0)
EDIT:
If want sum only some columns:
cols = set(df1.columns).intersection(df2.columns)
df3 = pd.concat([df1[['a','b']], df2[['a','b']]], keys=['b', 'n']).sum(level=0)
And for all columns in both DataFrames:
cols = list(set(df1.columns).intersection(df2.columns))
df3 = pd.concat([df1[cols], df2[cols]], keys=['b', 'n']).sum(level=0)
print (df3)
a b
b 4 7
n 10 9
You can do this with sum
and concat
.
pd.concat([df1.sum(), df2.sum()], 1).T
a b
0 4 7
1 10 9
Or, with a keys
argument -
pd.concat([df1.sum(), df2.sum()], 1, keys=['b', 'n']).T
a b
b 4 7
n 10 9
If you have many such dataframes, and assuming they have the same columns, you can put them into a single list and call sum inside a list comprehension -
df_list = [df1, df2, ...]
pd.concat([df.sum() for df in df_list], 1, keys=['b', 'n']).T
a b
b 4 7
n 10 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.