[英]Python add / merge rows of a dataframe together based on multiple conditions
Good afternoon, I hope that you are well.下午好,希望你一切都好。
I have an xlsx file in the following format that which is output from a Python function I have been using to parse data:-我有一个以下格式的 xlsx 文件,它是 output 来自 Python function 我一直用来解析数据:-
I have loaded this xlsx file into a pandas df in an attempt to achieve the following output:-我已将此 xlsx 文件加载到 pandas df 中,以尝试实现以下 output:-
The requirements that I am trying to satisfy are:- For each row in the dataframe, if the the "Application ID" and "Test Phase" column values match, then I would like to add the row values for those column values together and replace the original matched rows with one row containing the summed values.我要满足的要求是:- 对于 dataframe 中的每一行,如果“应用程序 ID”和“测试阶段”列值匹配,那么我想将这些列值的行值添加在一起并替换原始匹配行,其中一行包含总和值。
Where there is match in the column values, the original row should remain in place.如果列值匹配,则原始行应保留在原位。
If there any pointers on how to achieve this, it would be much appreciated.如果有任何关于如何实现这一目标的指示,将不胜感激。 I have attempted to achieve this code in the function prior to writing the values to the source xlsx output file however I assumed it would be easier to achieve by working with pandas / numpy. I have attempted to achieve this code in the function prior to writing the values to the source xlsx output file however I assumed it would be easier to achieve by working with pandas / numpy.
Many thanks in advance Jimmy非常感谢吉米
Use groupby_sum
:使用groupby_sum
:
out = df.groupby(['Application ID', 'Test Phase'], as_index=False).sum()
print(out)
# Output
Application ID Test Phase Total Tests A
0 9 SIT 36 36
1 11 UAT 5 5
Setup:设置:
data = {'Application ID': [9, 9, 11],
'Test Phase': ['SIT', 'SIT', 'UAT'],
'Total Tests': [9, 27, 5],
'A': [9, 27, 5]}
df = pd.DataFrame(data)
print(df)
# Output
Application ID Test Phase Total Tests A
0 9 SIT 9 9
1 9 SIT 27 27
2 11 UAT 5 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.