[英]Removing Rows from Pandas DataFrame based on Multiple Column Values
I am trying to remove rows from a large data frame based on whether each row has certain values in either of two different columns.我正在尝试根据每行是否在两个不同列中的任何一个中具有某些值来从大型数据框中删除行。
I will have a Series called "finalists".我将有一个名为“决赛选手”的系列。 Finalists with be a series of names that will be imported from a different part of the code and will change each time its run.具有一系列名称的入围者将从代码的不同部分导入,并且每次运行时都会更改。
ex)前任)
finalists = ["Company A", "Company F", "Product S"... etc]决赛选手 = [“A 公司”、“F 公司”、“S 产品”...等]
The dataframe will be about 1,000 rows long and 200 columns wide dataframe 大约有 1,000 行长和 200 列宽
Simplifying it, the dataframe would look something like this:简化它,dataframe 看起来像这样:
category类别 | score分数 | description描述 | company_name公司名称 | product_name产品名称 | comments注释 |
---|---|---|---|---|---|
"----" “——” | 2.8 2.8 | "----" “——” | Company A A公司 | Product A产品A | "----" “——” |
"----" “——” | 1.2 1.2 | "----" “——” | Company B B公司 | Product B产品B | "----" “——” |
"----" “——” | 2.4 2.4 | "----" “——” | Company C公司 C | Product C产品 C | "----" “——” |
I need to keep the rows where either the company_name column or product_name column is one of the values in the Finalists Series (or remove rows where it isn't).我需要保留 company_name 列或 product_name 列是 Finalists Series 中的值之一的行(或删除不是的行)。
I tried doing something like this:我试着做这样的事情:
results = finalists.isin(app_data["company_name"]) or finalists.isin(app_data["product_name"])
but got an error that the answer was ambiguous但得到一个错误,答案是模棱两可的
You want something like你想要类似的东西
mask = app_data["company_name"].isin(finalists) | app_data["product_name"].isin(finalists)
filtered_app_data = app_data[mask]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.