计算新列 Pandas 中的行总和

Question

我有一个如下的 DataFrame，其中列长度可以增加很多。 我希望根据每行的总和创建一个新列

|---------------------|------------------|------------------|------------------|
|        A            |         B        |         C        |      Total       |
|---------------------|------------------|------------------|------------------|
|        x            |         34       |         8        |        42        |
|---------------------|------------------|------------------|------------------|
|        y            |         43       |        12        |        55        |
|---------------------|------------------|------------------|------------------|
|        z            |         6        |         321      |        327       |
|---------------------|------------------|------------------|------------------|

我知道我可以轻松做到： df['Total'] = df['B'] + df['C']但是我正在寻找更好的技术，因为我的列长度可能非常重要

Answer 1

您可以像这样在行中使用 apply：

df['Total'] = df.apply(np.sum, axis=1)

如果要跳转第一列，可以使用.loc：

df['Total'] = df.loc[:, 1:].apply(np.sum, axis=1)

Answer 2

对于具有大量行的数据帧， apply可能会非常缓慢。 尽可能避免它。 这是一个解决方法。

cols_to_sum = [<columns to sum over>]
df['Total'] = df[cols_to_sum].sum(axis = 1)

下面是两种方法的性能比较：

df = pd.DataFrame({"a" : np.random.randn(100000), 
                   "b": np.random.randn(100000), 
                   "c": np.random.randn(100000), 
                   "d": np.random.randn(100000), 
                   "e": np.random.randn(100000)})
cols_to_sum = list('abcde')

%%time
result1 = df[cols_to_sum].apply(np.sum, axis = 1)
>> CPU times: user 7.88 s, sys: 39.7 ms, total: 7.92 s
>> Wall time: 7.89 s

%%time
result2 = df[cols_to_sum].sum(axis = 1)
>> CPU times: user 9.51 ms, sys: 0 ns, total: 9.51 ms
>> Wall time: 17.5 ms

print((result1 == result2).all())
>> True

这对你来说是 400 倍的加速。

计算新列 Pandas 中的行总和

问题描述

2 个解决方案

解决方案1
1 2020-04-02 18:54:46

解决方案2
1 已采纳 2020-04-02 19:10:05

计算新列 Pandas 中的行总和

问题描述

2 个解决方案

解决方案1 1 2020-04-02 18:54:46

解决方案2 1 已采纳 2020-04-02 19:10:05

解决方案1
1 2020-04-02 18:54:46

解决方案2
1 已采纳 2020-04-02 19:10:05