简体   繁体   English

熊猫根据几列减去数据框中的行

[英]pandas subtract rows in dataframe according to a few columns

I have the following dataframe我有以下数据框

data = [
 {'col1': 11, 'col2': 111, 'col3': 1111},
 {'col1': 22, 'col2': 222, 'col3': 2222},
 {'col1': 33, 'col2': 333, 'col3': 3333},
 {'col1': 44, 'col2': 444, 'col3': 4444}
]

and the following list:以及以下列表:

lst = [(11, 111), (22, 222), (99, 999)]

I would like to get out of my data only rows that col1 and col2 do not exist in the lst我想从我的数据中仅删除 lst 中不存在 col1 和 col2 的行

result for above example would be:上面例子的结果是:

[
 {'col1': 33, 'col2': 333, 'col3': 3333},
 {'col1': 44, 'col2': 444, 'col3': 4444}
]

how can I achieve that?我怎样才能做到这一点?

import pandas as pd

df = pd.DataFrame(data)

list_df = pd.DataFrame(lst)

# command like ??
# df.subtract(list_df) 

You can extract the list of values using zip and slice using a mask generated with isna :您可以使用zip和 slice 使用由isna生成的掩码提取值列表:

a,b = zip(*lst)
data[~(data['col1'].isin(a)|data['col2'].isin(b))]

output:输出:

   col1  col2  col3
2    33   333  3333
3    44   444  4444

Or if you need both conditions to be true to drop:或者,如果您需要两个条件都成立才能删除:

data[~(data['col1'].isin(a)&data['col2'].isin(b))]

NB.注意。 if you have many columns, you can automate the process:如果你有很多列,你可以自动化这个过程:

mask = sum(data[col].isin(v) for col,v in zip(data, zip(*lst))).eq(0)
df[mask]

If need test by pairs is possible compare MultiIndex created by both columns in Index.isin with inverted mask by ~ in boolean indexing :如果对需要测试可能比较MultiIndex创建由两列Index.isin与反转屏蔽由~boolean indexing

df = df[~df.set_index(['col1','col2']).index.isin(lst)]
print (df)
   col1  col2  col3
2    33   333  3333
3    44   444  4444

Or with left join by merge with indicator parameter:或者通过与指标参数merge左连接:

m = df.merge(list_df, 
             left_on=['col1','col2'],
             right_on=[0,1], 
             indicator=True, 
             how='left')['_merge'].eq('left_only')
df = df[mask]
print (df)
   col1  col2  col3
2    33   333  3333
3    44   444  4444

You can create a tuple out of your col1 and col2 columns and then check if those tuples are in the lst list.您可以从 col1 和 col2 列中创建一个元组,然后检查这些元组是否在 lst 列表中。 Then drop the fines with True values.然后用 True 值丢弃罚款。

df.drop(df.apply(lambda x: (x['col1'], x['col2']), axis =1)
          .isin(lst)
          .loc[lambda x: x==True]
          .index)

With this solution you don't even have to make the second list a dataframe使用此解决方案,您甚至不必将第二个列表设为数据框

You can create the tuples of col1 and col2 by .apply() with tuple .您可以通过.apply()tuple创建col1col2tuple Then test these tuples whether in lst by .isin() (add ~ for the negation/opposite condition).然后通过.isin()测试这些元组是否在lst (添加~表示否定/相反条件)。

Finally, locate the rows with .loc , as follows:最后,使用.loc定位行,如下所示:

df.loc[~df[['col1', 'col2']].apply(tuple, axis=1).isin(lst)]

Result:结果:

   col1  col2  col3
2    33   333  3333
3    44   444  4444

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM