简体   繁体   English

Pandas数据帧中每两列的总和

[英]Sum of Every Two Columns in Pandas dataframe

When I am using Pandas, I have a problem. 当我使用熊猫时,我遇到了问题。 My task is like this: 我的任务是这样的:

df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
    a b c d e f
0   1 2 3 4 5 6
1   1 2 3 4 5 6 
2   1 2 3 4 5 6

what I want to do is the output dataframe looks like this: 我想要做的是输出数据框如下所示:

Out:
    s1   s2   s3
0   3    7    11
1   3    7    11
2   3    7    11

That is to say, sum the column (a,b),(c,d),(e,f) separately and rename the result columns names as (s1,s2,s3). 也就是说,分别对列(a,b),(c,d),(e,f)求和,并将结果列名重命名为(s1,s2,s3)。 Could anyone help solve this problem in Pandas? 任何人都可以帮助解决熊猫中的这个问题吗? Thank you so much. 非常感谢。

1) Perform groupby wrt columns by supplying axis=1 . 1)通过提供axis=1执行groupby wrt列。 Per @Boud's comment, you exactly get what you want with a minor tweak in the grouping array: 根据Per @ Boud的评论,你可以通过分组数组中的小调整得到你想要的结果:

df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

在此输入图像描述

Grouping gets performed according to this condition: 根据这种情况进行分组:

np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)

2) Use np.add.reduceat which is a faster alternative: 2)使用np.add.reduceat这是一个更快的选择:

df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')

在此输入图像描述

Timing Constraints: 时间限制:

For a DF of 1 million rows spanned over 20 columns: 对于横跨20列的100万行DF

from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)

def with_groupby(df):
    return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

def with_reduceat(df):
    df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
    df.columns = df.columns + 1
    return df.add_prefix('s')

# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True

%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop

%timeit with_reduceat(df.copy())   # <--- (>3X faster)
1 loop, best of 3: 345 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM