Pandas：基于多个不同的列创建列

Question

好的，我觉得这个很容易，并且在以前的线程中也应该有正确的答案，但显然我无法自己或在线程中找到答案。 这是我得到的：我有一个 dataframe 不同的样本属于组

pd.DataFrame({'sample1': [1,2,3], 'sample2':[2,4,6], 'sample3':[4,4,4], 'sample4':[6,6,6], 'divisor':[1,2,1]})
groups=[["sample1","sample2"],["sample3","sample4"]]

我希望代码根据该样本所在组的总和为每个样本创建一个新列。如果商低于 0，则结果应为 0，否则应为原始值。 第一部分完美地进行了求和：

for i in range(len(groups)):
    df["groupsum"+str(i)]=df[groups[i]].sum(axis=1)

    for sample in groups[i]:
        df[sample+"_corr"]=""
        df[sample+"_corr"]= df[sample].apply(lambda x: 0 if (df["groupsum"+str(i)]/df["divisor"])<4 else df[sample])

我得到错误：

 ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

那么处理这个问题的正确方法是什么？ 非常感谢提前

Answer 1

只需使用np.wehere而不是使用应用循环 dataframe：

df[sample+"_corr"]= np.where((df["groupsum"+str(i)]/df["divisor"])<4 , 0 , df[sample])

Output：

    sample1 sample2 sample3 sample4 divisor groupsum0   sample1_corr    sample2_corr    groupsum1   sample3_corr    sample4_corr
0   1   2   4   6   1   3   0   0   10  4   6
1   2   4   4   6   2   6   0   0   10  4   6
2   3   6   4   6   1   9   3   6   10  4   6

这也是更好的性能，因为 apply 是非常慢的解决方案，应尽可能避免。

Pandas：基于多个不同的列创建列

问题描述

1 个解决方案

解决方案1
1 2021-01-11 15:53:36

Pandas：基于多个不同的列创建列

问题描述

1 个解决方案

解决方案1 1 2021-01-11 15:53:36

解决方案1
1 2021-01-11 15:53:36