简体   繁体   English

将 Pandas Dataframe 中的行与如何处理每一列的特定规则合并

[英]Merge rows in Pandas Dataframe with specific rules on how to handle each column

I have a dataframe with meaningless index that contains rows in which one of the columns can be repeated.我有一个 dataframe 具有无意义的索引,其中包含可以重复其中一列的行。 Something like this:像这样的东西:

df = pd.DataFrame({
    'file': ['file1.txt', 'file2.txt', 'file3.txt', 'file1.txt'],
    'size': ['52', '41', '32', '55'],
    'attempts': [4, 4, 3, 1]
})

        file size  attempts
0  file1.txt   52         4
1  file2.txt   41         4
2  file3.txt   32         3
3  file1.txt   55         1

I want to get rid of duplicates, but not just deleting them.我想摆脱重复,但不只是删除它们。 I'd like to keep just one row per different 'file', that the 'size' column becomes the maximum all the repeated elements, and that the 'attempts' column becomes the sum of repeated 'attempts'.我想每个不同的“文件”只保留一行,“大小”列成为所有重复元素的最大值,“尝试”列成为重复“尝试”的总和。 In other words, I'd like to get:换句话说,我想得到:

        file size  attempts
0  file1.txt   55         5
1  file2.txt   41         4
2  file3.txt   32         3

I know how to do this explicitly looping through the Dataframe, but I'd like to make it more efficient.我知道如何通过 Dataframe 显式循环执行此操作,但我想让它更高效。

You can use .groupby() and .agg() to aggregate with size using max value and attempts using sum on the attempt counts.您可以使用.groupby().agg()使用max聚合size ,并使用attempts计数的sum进行尝试。

df.groupby('file', as_index=False).agg({'size': 'max', 'attempts': 'sum'})

Result:结果:

        file size  attempts
0  file1.txt   55         5
1  file2.txt   41         4
2  file3.txt   32         3

This is a very dirty solution, but nonetheless should work:这是一个非常肮脏的解决方案,但仍然应该工作:

df_ = df.groupby('file').max('size')
df_ = pd.concat([df_.iloc[:,0], df.groupby('file').sum('attempts')['attempts']], axis=1)

df_

           size     attempts
file        
file1.txt   55         5
file2.txt   41         4
file3.txt   32         3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM