[英]pandas subtract rows in dataframe according to a few columns
I have the following dataframe我有以下数据框
data = [
{'col1': 11, 'col2': 111, 'col3': 1111},
{'col1': 22, 'col2': 222, 'col3': 2222},
{'col1': 33, 'col2': 333, 'col3': 3333},
{'col1': 44, 'col2': 444, 'col3': 4444}
]
and the following list:以及以下列表:
lst = [(11, 111), (22, 222), (99, 999)]
I would like to get out of my data only rows that col1 and col2 do not exist in the lst我想从我的数据中仅删除 lst 中不存在 col1 和 col2 的行
result for above example would be:上面例子的结果是:
[
{'col1': 33, 'col2': 333, 'col3': 3333},
{'col1': 44, 'col2': 444, 'col3': 4444}
]
how can I achieve that?我怎样才能做到这一点?
import pandas as pd
df = pd.DataFrame(data)
list_df = pd.DataFrame(lst)
# command like ??
# df.subtract(list_df)
You can extract the list of values using zip
and slice using a mask generated with isna
:您可以使用
zip
和 slice 使用由isna
生成的掩码提取值列表:
a,b = zip(*lst)
data[~(data['col1'].isin(a)|data['col2'].isin(b))]
output:输出:
col1 col2 col3
2 33 333 3333
3 44 444 4444
Or if you need both conditions to be true to drop:或者,如果您需要两个条件都成立才能删除:
data[~(data['col1'].isin(a)&data['col2'].isin(b))]
NB.注意。 if you have many columns, you can automate the process:
如果你有很多列,你可以自动化这个过程:
mask = sum(data[col].isin(v) for col,v in zip(data, zip(*lst))).eq(0)
df[mask]
If need test by pairs is possible compare MultiIndex
created by both columns in Index.isin
with inverted mask by ~
in boolean indexing
:如果对需要测试可能比较
MultiIndex
创建由两列Index.isin
与反转屏蔽由~
在boolean indexing
:
df = df[~df.set_index(['col1','col2']).index.isin(lst)]
print (df)
col1 col2 col3
2 33 333 3333
3 44 444 4444
Or with left join by merge
with indicator parameter:或者通过与指标参数
merge
左连接:
m = df.merge(list_df,
left_on=['col1','col2'],
right_on=[0,1],
indicator=True,
how='left')['_merge'].eq('left_only')
df = df[mask]
print (df)
col1 col2 col3
2 33 333 3333
3 44 444 4444
You can create a tuple out of your col1 and col2 columns and then check if those tuples are in the lst list.您可以从 col1 和 col2 列中创建一个元组,然后检查这些元组是否在 lst 列表中。 Then drop the fines with True values.
然后用 True 值丢弃罚款。
df.drop(df.apply(lambda x: (x['col1'], x['col2']), axis =1)
.isin(lst)
.loc[lambda x: x==True]
.index)
With this solution you don't even have to make the second list a dataframe使用此解决方案,您甚至不必将第二个列表设为数据框
You can create the tuples of col1
and col2
by .apply()
with tuple
.您可以通过
.apply()
和tuple
创建col1
和col2
的tuple
。 Then test these tuples whether in lst
by .isin()
(add ~
for the negation/opposite condition).然后通过
.isin()
测试这些元组是否在lst
(添加~
表示否定/相反条件)。
Finally, locate the rows with .loc
, as follows:最后,使用
.loc
定位行,如下所示:
df.loc[~df[['col1', 'col2']].apply(tuple, axis=1).isin(lst)]
Result:结果:
col1 col2 col3
2 33 333 3333
3 44 444 4444
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.