熊猫根据几列减去数据框中的行

Question

I have the following dataframe我有以下数据框

data = [
 {'col1': 11, 'col2': 111, 'col3': 1111},
 {'col1': 22, 'col2': 222, 'col3': 2222},
 {'col1': 33, 'col2': 333, 'col3': 3333},
 {'col1': 44, 'col2': 444, 'col3': 4444}
]

and the following list:以及以下列表：

lst = [(11, 111), (22, 222), (99, 999)]

I would like to get out of my data only rows that col1 and col2 do not exist in the lst我想从我的数据中仅删除 lst 中不存在 col1 和 col2 的行

result for above example would be:上面例子的结果是：

[
 {'col1': 33, 'col2': 333, 'col3': 3333},
 {'col1': 44, 'col2': 444, 'col3': 4444}
]

how can I achieve that?我怎样才能做到这一点？

import pandas as pd

df = pd.DataFrame(data)

list_df = pd.DataFrame(lst)

# command like ??
# df.subtract(list_df)

Answer 1

You can extract the list of values using zip and slice using a mask generated with isna :您可以使用zip和 slice 使用由isna生成的掩码提取值列表：

a,b = zip(*lst)
data[~(data['col1'].isin(a)|data['col2'].isin(b))]

output:输出：

   col1  col2  col3
2    33   333  3333
3    44   444  4444

Or if you need both conditions to be true to drop:或者，如果您需要两个条件都成立才能删除：

data[~(data['col1'].isin(a)&data['col2'].isin(b))]

NB.注意。 if you have many columns, you can automate the process:如果你有很多列，你可以自动化这个过程：

mask = sum(data[col].isin(v) for col,v in zip(data, zip(*lst))).eq(0)
df[mask]

Answer 2

If need test by pairs is possible compare MultiIndex created by both columns in Index.isin with inverted mask by ~ in boolean indexing :如果对需要测试可能比较MultiIndex创建由两列Index.isin与反转屏蔽由~在boolean indexing ：

df = df[~df.set_index(['col1','col2']).index.isin(lst)]
print (df)
   col1  col2  col3
2    33   333  3333
3    44   444  4444

Or with left join by merge with indicator parameter:或者通过与指标参数merge左连接：

m = df.merge(list_df, 
             left_on=['col1','col2'],
             right_on=[0,1], 
             indicator=True, 
             how='left')['_merge'].eq('left_only')
df = df[mask]
print (df)
   col1  col2  col3
2    33   333  3333
3    44   444  4444

Answer 3

You can create a tuple out of your col1 and col2 columns and then check if those tuples are in the lst list.您可以从 col1 和 col2 列中创建一个元组，然后检查这些元组是否在 lst 列表中。 Then drop the fines with True values.然后用 True 值丢弃罚款。

df.drop(df.apply(lambda x: (x['col1'], x['col2']), axis =1)
          .isin(lst)
          .loc[lambda x: x==True]
          .index)

With this solution you don't even have to make the second list a dataframe使用此解决方案，您甚至不必将第二个列表设为数据框

Answer 4

You can create the tuples of col1 and col2 by .apply() with tuple .您可以通过.apply()和tuple创建col1和col2的tuple 。 Then test these tuples whether in lst by .isin() (add ~ for the negation/opposite condition).然后通过.isin()测试这些元组是否在lst （添加~表示否定/相反条件）。

Finally, locate the rows with .loc , as follows:最后，使用.loc定位行，如下所示：

df.loc[~df[['col1', 'col2']].apply(tuple, axis=1).isin(lst)]

Result:结果：

   col1  col2  col3
2    33   333  3333
3    44   444  4444

熊猫根据几列减去数据框中的行

问题描述

4 个解决方案

解决方案1
1 2021-11-08 09:58:51

解决方案2
1 2021-11-08 09:59:31

解决方案3
1 2021-11-08 10:01:19

解决方案4
1 2021-11-08 10:01:41

熊猫根据几列减去数据框中的行

问题描述

4 个解决方案

解决方案1 1 2021-11-08 09:58:51

解决方案2 1 2021-11-08 09:59:31

解决方案3 1 2021-11-08 10:01:19

解决方案4 1 2021-11-08 10:01:41

解决方案1
1 2021-11-08 09:58:51

解决方案2
1 2021-11-08 09:59:31

解决方案3
1 2021-11-08 10:01:19

解决方案4
1 2021-11-08 10:01:41