为 pandas loc 构建表达式以根据列表的长度使用特定列过滤数据

Question

I am pretty new to python programming and I want to use Pandas.loc to filter data using a certain column according to the length of a products' list.我是 python 编程的新手，我想使用 Pandas.loc 根据产品列表的长度使用特定列过滤数据。

I have the (condensed) list:我有（精简）列表：

products_selected = ["Apple", "Banana", "Strawberry"]

I produced the following very poor code (condensed) to reach my goal for the column PRODUCT in the Pandas DataFrame:我生成了以下非常糟糕的代码（精简版）来实现 Pandas DataFrame 中 PRODUCT 列的目标：

if len(products_selected) == 1:
    data = data.loc[(data.PRODUCT == products_selected[0])]
   
elif len(products_selected) == 2:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1])]

elif len(products_selected) == 3:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1]) | (data.PRODUCT == products_selected[2])]

How can I do this the pythonic way?我怎样才能以 pythonic 方式做到这一点？

And what's more - independent to the length of the list without having to expand my poor coding manually?更重要的是 - 与列表的长度无关，而不必手动扩展我糟糕的编码？

I can't use reduce() or something like that, it should be done without additional functions except pandas or numpy.我不能使用 reduce() 或类似的东西，除了 pandas 或 numpy 之外，它应该在没有其他功能的情况下完成。

It looks so easy but due to my limited coding experience I didn't manage it.它看起来很简单，但由于我有限的编码经验，我没有做到。

Answer 1

Slicing may help.切片可能会有所帮助。

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels

[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})

products_selected = ["Apple"]
print(data[:len(products_selected)])

products_selected = ["Apple", "Banana"]
print(data[:len(products_selected)])

products_selected = ["Apple", "Banana", "Strawberry"]
print(data[:len(products_selected)])

[out]:
  PRODUCT
0   Apple
  PRODUCT
0   Apple
1  Banana
      PRODUCT
0       Apple
1      Banana
2  Strawberry

... but it makes no assumption about returning the same values from data that are in products_selected . ...但它没有假设从products_selected中的data返回相同的值。

For example:例如：

[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})

products_selected = ["Apple", "Strawberry"]
data[:len(products_selected)]

[out]:
    PRODUCT
0   Apple
1   Banana

There is also a reference in your example to countries_selected .您的示例中还引用了countries_selected 。 I made the assumption that it was a typo and should have also been products_selected .我假设这是一个错字，也应该是products_selected 。

Answer 2

With some long try and error, I finally got another solution, working for me, too.经过长时间的尝试和错误，我终于找到了另一个解决方案，也适用于我。 I want to post it, maybe other beginners are also interested in.我想贴出来，说不定其他初学者也有兴趣。

It's as easy as this:就这么简单：

data = data.loc[data["PRODUCT"].isin(products_selected)]

Maybe Pandas .isin can help others, too.也许 Pandas .isin也可以帮助其他人。

为 pandas loc 构建表达式以根据列表的长度使用特定列过滤数据

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-12-17 19:15:35

解决方案2
0 2022-12-17 21:49:51

为 pandas loc 构建表达式以根据列表的长度使用特定列过滤数据

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-12-17 19:15:35

解决方案2 0 2022-12-17 21:49:51

解决方案1
0 已采纳 2022-12-17 19:15:35

解决方案2
0 2022-12-17 21:49:51