简体   繁体   English

如何根据多列条件删除DF行?

[英]How to delete DF rows based on multiple column conditions?

Here's an example of DF:这是一个DF的例子:

        EC1     EC2     CDC      L1      L2      L3      L4      L5      L6      VNF
0    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [1, 0]
1    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [0, 1]
2    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [-1, 0]
3    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, -1]
4    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [1, 0]
5    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [0, 1]
6    [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]  [-1, 0]

How to delete those rows where df['VNF'] = [-1, 0] or [0, -1] and df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])?如何删除 df['VNF'] = [-1, 0] 或 [0, -1] 和 df['EC1']、df['EC2'] 和 df['CDC'] 具有值的那些行0 在与 df['VNF'] 中的 -1 相同的索引 position 中?

The expected result would be:预期的结果是:

        EC1     EC2     CDC      L1      L2      L3      L4      L5      L6      VNF
0    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [1, 0]
1    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [0, 1]
2    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [1, 0]
3    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [0, 1]
4    [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]  [-1, 0]

Here's the constructor for the DataFrame:这是 DataFrame 的构造函数:

data = {'EC1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [1, 0]],
 'EC2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'CDC': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
 'L1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L3': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L4': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L5': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
 'L6': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
 'VNF': [[1, 0], [0, 1], [-1, 0], [0, -1], [1, 0], [0, 1], [-1, 0]]}

List comprehension to find which indexes to drop might help see the conditions more directly:列表理解以查找要删除的索引可能有助于更直接地查看条件:

columns = df.EC1, df.EC2, df.CDC, df.VNF

inds_to_drop = [iloc
                for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))
                if vnf == [-1, 0] or vnf == [0, -1]
                if all(val[idx] == 0
                       for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))]

new_df = df.drop(df.index[inds_to_drop])

to get要得到

>>> new_df

      EC1     EC2     CDC      L1      L2      L3      L4      L5      L6      VNF
0  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [1, 0]
1  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [0, 1]
4  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [1, 0]
5  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [0, 1]
6  [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]  [-1, 0]

You can explode every column of df , then identify the elements satisfying the first (sum of "VNF" values must be -1) and second condition and filter out the elements that satisfy both conditions to create temp .您可以分解df的每一列,然后识别满足第一个(“VNF”值之和必须为-1)和第二个条件的元素,并过滤掉满足这两个条件的元素以创建temp Then since each cell must have two elements, you can count whether each index contains 2 elements by transforming count , then filter the rows with two indices and groupby the index and aggregate to list:然后由于每个单元格必须有两个元素,您可以通过转换count来计算每个索引是否包含 2 个元素,然后过滤具有两个索引的行并按索引groupby并聚合到列表:

exploded = df.explode(df.columns.tolist())
first_cond = exploded.groupby(level=0)['VNF'].transform('sum').eq(-1)
second_cond = exploded['VNF'].eq(-1) & exploded['EC1'].eq(0) & exploded['EC2'].eq(0) & exploded['CDC'].eq(0)

temp = exploded[~(first_cond & second_cond)]
out = temp[temp.groupby(level=0)['VNF'].transform('count').gt(1)].groupby(level=0).agg(list).reset_index(drop=True)

Output: Output:

      EC1     EC2     CDC      L1      L2      L3      L4      L5      L6  \
0  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   
1  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   
2  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   
3  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   
4  [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   

       VNF  
0   [1, 0]  
1   [0, 1]  
2   [1, 0]  
3   [0, 1]  
4  [-1, 0]  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM