Pandas数据帧中每两列的总和

Question

When I am using Pandas, I have a problem. 当我使用熊猫时，我遇到了问题。 My task is like this: 我的任务是这样的：

df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
    a b c d e f
0   1 2 3 4 5 6
1   1 2 3 4 5 6 
2   1 2 3 4 5 6

what I want to do is the output dataframe looks like this: 我想要做的是输出数据框如下所示：

Out:
    s1   s2   s3
0   3    7    11
1   3    7    11
2   3    7    11

That is to say, sum the column (a,b),(c,d),(e,f) separately and rename the result columns names as (s1,s2,s3). 也就是说，分别对列（a，b），（c，d），（e，f）求和，并将结果列名重命名为（s1，s2，s3）。 Could anyone help solve this problem in Pandas? 任何人都可以帮助解决熊猫中的这个问题吗？ Thank you so much. 非常感谢。

Answer 1

1) Perform groupby wrt columns by supplying axis=1 . 1）通过提供axis=1执行groupby wrt列。 Per @Boud's comment, you exactly get what you want with a minor tweak in the grouping array: 根据Per @ Boud的评论，你可以通过分组数组中的小调整得到你想要的结果：

df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

Grouping gets performed according to this condition: 根据这种情况进行分组：

np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)

2) Use np.add.reduceat which is a faster alternative: 2）使用np.add.reduceat这是一个更快的选择：

df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')

Timing Constraints: 时间限制：

For a DF of 1 million rows spanned over 20 columns: 对于横跨20列的100万行DF ：

from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)

def with_groupby(df):
    return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

def with_reduceat(df):
    df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
    df.columns = df.columns + 1
    return df.add_prefix('s')

# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True

%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop

%timeit with_reduceat(df.copy())   # <--- (>3X faster)
1 loop, best of 3: 345 ms per loop

Pandas数据帧中每两列的总和

问题描述

1 个解决方案

解决方案1
9 2016-11-17 17:15:38

Pandas数据帧中每两列的总和

问题描述

1 个解决方案

解决方案1 9 2016-11-17 17:15:38

解决方案1
9 2016-11-17 17:15:38