简体   繁体   English

使用熊猫删除基于行的列的值

[英]Deleting a row based values of of a column using pandas

I'm new to pandas and need some help with deleting rows fulfilling certain conditions from my original table below. 我是熊猫的新手,需要一些帮助来删除下面原始表中满足某些条件的行。

Table1 (original table): 表1(原始表):

    ID  SerialNo    calls
1   171723  Blue    2
2   171723  Green   3
3   171723  Blue    4
4   171723  Yellow  5
5   171723  Blue    1
6   171724  Green   1
7   171724  Yellow  2
8   171724  Green   3
9   171724  Green   4
10  171724  Green   5
11  171724  Yellow  6

Table1_mod (filtered table): Table1_mod(过滤后的表格):

    ID  SerialNo    calls
 1  171723  Blue    2
 2  171723  Green   3
 3  171723  Blue    4
 4  171723  Yellow  5
 5  171724  Green   1
 6  171724  Yellow  2
 7  171724  Green   3
 8  171724  Green   4
 9  171724  Green   5
10  171724  Yellow  6

I want to obtain the modified table based on the following: 我想基于以下内容获取修改后的表:

ID column contain unique numbers. ID列包含唯一编号。 For example, for all rows with '171723', and under column 'SerialNO' I'm interested in 'Blue'. 例如,对于所有带有'171723'的行,并且在'SerialNO'列下,我对'Blue'感兴趣。 So I want the last row of '171723' deleted because Blue (SerialNo column) has a row corresponding to '1' (calls column) which is smaller than than '2' (the first occurrence of Blue on Calls column for '171723') 因此,我想删除“ 171723”的最后一行,因为蓝色(SerialNo列)具有对应于“ 1”(调用列)的行,该行小于“ 2”(“ 171723”的蓝色出现在呼叫列中的第一个出现) )

How can I write a pandas code to remove rows fulfilling these conditions? 如何编写熊猫代码来删除满足这些条件的行?

Thanks 谢谢

Just to clarify: you want to look into groups of id numbers, find the first occurrence of 'Blue' in the Serial Numbers column, and remove any other rows in this group where the Serial Number is also Blue and the calls value is less than the calls value in the first 'Blue' row? 只是要澄清一下:您想查看ID号组,在“序列号”列中找到第一个出现的“蓝色”,并删除该组中序列号也为蓝色且呼叫值小于“蓝色”第一行中的通话价值?

I would first group your dataframe by Id 我先按ID将您的数据框分组

id_groups = Table1.groupby('ID')

Then define a function to do your filtering: 然后定义一个函数进行过滤:

def blue_filter(group):
    blues = group[group['SerialNo'] == 'Blue']
    try:
        first_blue_call =  blues['calls'].iloc[0]
    except:
        return group
    return group[(group['SerialNo'] != 'Blue') | (group['calls'] >= first_blue_call)] 

Now apply that function to your groups and reform the groups as a new dataframe and reset the indices 现在将该功能应用于您的组并将组重新设置为新的数据框并重置索引

Table1_mod = id_groups.apply(blue_filter)
Table1_mod.index =  Table1_mod.index.levels[1]

I think you can do this all in one go using apply and a separate dict that keeps track of your maximum value of calls. 我认为您可以使用Apply和单独的dict一次完成所有操作,以跟踪您的最大通话价值。 This also deals with the fact that I think you want to throw out any row where the call number is lower than any previous row for that ID, SerialNo combo. 这还涉及以下事实:我认为您想丢弃该ID为SerialNo组合的呼叫号低于前一行的任何行。

max_dict = {}

def keep_row(row):
    if row.calls > max_dict.get((row.ID, row.SerialNo), 0):
        max_dict[(row.ID, row.SerialNo)] = row.calls
        return True
    else:
        return False

Table1_mod = Table1[Table1.apply(keep_row, axis=1)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM