繁体   English   中英

Pandas 基于多个条件丢弃行

[英]Pandas droping rows based on multiple conditions

我有两个数据框

A = pd.DataFrame(
    [["abc@gmail.com","4311","3000","STR_1","1384"],
     ["abc@gmail.com","4311","3000","STR_2","1440"  ],
     ["xyz@gmail.com","4311","3000","STR_3","1300"  ],
     ["pqr@gmail.com","4311","3000","STR_3","1300"  ]],
    columns=["EMAIL",   "PRODUCT_ID",   "POST_CODE",    "STORE_NAME",   "STORE_ID"],
)

在此处输入图像描述

B = pd.DataFrame(
    [["abc@gmail.com","4311","3000","STR_1","1384"],
     ["xyz@gmail.com","4311","3000","STR_3","1300"  ],],
    columns=["EMAIL",   "PRODUCT_ID",   "POST_CODE",    "STORE_NAME",   "STORE_ID"],
)

在此处输入图像描述

现在我需要从 dataframe A 中删除与数据帧 B 具有相同 EMAIL、PRODUCT_ID 和POST_CODE的记录。所以预期的 Z78E6221F6393D1356Z81DB398F14CED6

在此处输入图像描述

我尝试使用删除重复项,例如:

pd.concat([A, B]).drop_duplicates(keep=False)

但这不能基于自定义列删除行,在这种情况下为 POST_CODE

在要过滤掉的列上使用 select 的subset

pd.concat([A, B]).drop_duplicates(subset=["EMAIL",   "PRODUCT_ID",   "POST_CODE"], keep=False)

解决方案包含以下要素:

  1. pandas set_index() function。
  2. pandas isin() function。

首先,我们将两个数据帧中的索引设置为“EMAIL”、“PRODUCT_ID”、“POST_CODE”,然后我们可以使用这些索引来使用 isin 过滤数据帧。

编码:

import pandas as pd

A = pd.DataFrame(
    [["abc@gmail.com","4311","3000","STR_1","1384"],
     ["abc@gmail.com","4311","3000","STR_2","1440"  ],
     ["xyz@gmail.com","4311","3000","STR_3","1300"  ],
     ["pqr@gmail.com","4311","3000","STR_3","1300"  ]],
    columns=["EMAIL",   "PRODUCT_ID",   "POST_CODE",    "STORE_NAME",   "STORE_ID"],
)

B = pd.DataFrame(
    [["abc@gmail.com","4311","3000","STR_1","1384"],
     ["xyz@gmail.com","4311","3000","STR_3","1300"  ],],
    columns=["EMAIL",   "PRODUCT_ID",   "POST_CODE",    "STORE_NAME",   "STORE_ID"],
)

i1 = A.set_index(["EMAIL", "PRODUCT_ID", "POST_CODE"]).index
i2 = B.set_index(["EMAIL", "PRODUCT_ID", "POST_CODE"]).index
result = A[~i1.isin(i2)]

Output:

     EMAIL         PRODUCT_ID    POST_CODE  STORE_NAME  STORE_ID
3   pqr@gmail.com    4311          3000       STR_3       1300

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM