简体   繁体   English

按 Pandas 中的多列求和分组(忽略重复项)

[英]Group By Sum Multiple Columns in Pandas (Ignoring duplicates)

I have the following code where my dataframe contains 3 columns我有以下代码,其中我的 dataframe 包含 3 列

  toBeSummed toBeSummed2 toBesummed3  someColumn
0          X           X           Y         NaN
1          X           Y           Z         NaN
2          Y           Y           Z         NaN
3          Z           Z           Z         NaN
oneframe = pd.concat([df['toBeSummed'],df['toBeSummed2'],df['toBesummed3']], axis=1).reset_index()


temp = oneframe.groupby(['toBeSummed']).size().reset_index()
temp2 = oneframe.groupby(['toBeSummed2']).size().reset_index()
temp3 = oneframe.groupby(['toBeSummed3']).size().reset_index()
temp.columns.values[0] = "SameName"
temp2.columns.values[0] = "SameName"
temp3.columns.values[0]  = "SameName"

final = pd.concat([temp,temp2,temp3]).groupby(['SameName']).sum().reset_index()
final.columns.values[0] = "Letter"
final.columns.values[1] = "Sum"

The problem here is that with the code I have, it sums up all instances of each value.这里的问题是,使用我的代码,它总结了每个值的所有实例。 Meaning calling final would result in意味着调用 final 会导致

  Letter  Sum
0      X    3
1      Y    4
2      Z    5

However I want it to not count more than once if the same value exists in the row (Ie in the first row there are two X's so it would only count the one X) Meaning the desired output is但是,如果行中存在相同的值,我希望它不会多次计数(即在第一行中有两个 X,因此它只会计算一个 X),这意味着所需的 output 是

  Letter  Sum
0      X    2
1      Y    3
2      Z    3

I can update or add more comments if this is confusing.如果这令人困惑,我可以更新或添加更多评论。

Given df :鉴于df

  toBeSummed toBeSummed2 toBesummed3  someColumn
0          X           X           Y         NaN
1          X           Y           Z         NaN
2          Y           Y           Z         NaN
3          Z           Z           Z         NaN

Doing:正在做:

sum_cols = ['toBeSummed', 'toBeSummed2', 'toBesummed3']

out = df[sum_cols].apply(lambda x: x.unique()).explode().value_counts()
print(out.to_frame('Sum'))

Output: Output:

   Sum
Y    3
Z    3
X    2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM