![](/img/trans.png)
[英]droping rows based on the values in cells by iterating through a list of values in pandas
[英]Pandas droping rows based on multiple conditions
我有两个数据框
A = pd.DataFrame(
[["abc@gmail.com","4311","3000","STR_1","1384"],
["abc@gmail.com","4311","3000","STR_2","1440" ],
["xyz@gmail.com","4311","3000","STR_3","1300" ],
["pqr@gmail.com","4311","3000","STR_3","1300" ]],
columns=["EMAIL", "PRODUCT_ID", "POST_CODE", "STORE_NAME", "STORE_ID"],
)
B = pd.DataFrame(
[["abc@gmail.com","4311","3000","STR_1","1384"],
["xyz@gmail.com","4311","3000","STR_3","1300" ],],
columns=["EMAIL", "PRODUCT_ID", "POST_CODE", "STORE_NAME", "STORE_ID"],
)
现在我需要从 dataframe A 中删除与数据帧 B 具有相同 EMAIL、PRODUCT_ID 和POST_CODE的记录。所以预期的 Z78E6221F6393D1356Z81DB398F14CED6
我尝试使用删除重复项,例如:
pd.concat([A, B]).drop_duplicates(keep=False)
但这不能基于自定义列删除行,在这种情况下为 POST_CODE
在要过滤掉的列上使用 select 的subset
pd.concat([A, B]).drop_duplicates(subset=["EMAIL", "PRODUCT_ID", "POST_CODE"], keep=False)
解决方案包含以下要素:
首先,我们将两个数据帧中的索引设置为“EMAIL”、“PRODUCT_ID”、“POST_CODE”,然后我们可以使用这些索引来使用 isin 过滤数据帧。
编码:
import pandas as pd
A = pd.DataFrame(
[["abc@gmail.com","4311","3000","STR_1","1384"],
["abc@gmail.com","4311","3000","STR_2","1440" ],
["xyz@gmail.com","4311","3000","STR_3","1300" ],
["pqr@gmail.com","4311","3000","STR_3","1300" ]],
columns=["EMAIL", "PRODUCT_ID", "POST_CODE", "STORE_NAME", "STORE_ID"],
)
B = pd.DataFrame(
[["abc@gmail.com","4311","3000","STR_1","1384"],
["xyz@gmail.com","4311","3000","STR_3","1300" ],],
columns=["EMAIL", "PRODUCT_ID", "POST_CODE", "STORE_NAME", "STORE_ID"],
)
i1 = A.set_index(["EMAIL", "PRODUCT_ID", "POST_CODE"]).index
i2 = B.set_index(["EMAIL", "PRODUCT_ID", "POST_CODE"]).index
result = A[~i1.isin(i2)]
Output:
EMAIL PRODUCT_ID POST_CODE STORE_NAME STORE_ID
3 pqr@gmail.com 4311 3000 STR_3 1300
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.