简体   繁体   English

为 pandas loc 构建表达式以根据列表的长度使用特定列过滤数据

[英]Build expression for pandas loc to filter data using a certain column according to length of a list

I am pretty new to python programming and I want to use Pandas.loc to filter data using a certain column according to the length of a products' list.我是 python 编程的新手,我想使用 Pandas.loc 根据产品列表的长度使用特定列过滤数据。

I have the (condensed) list:我有(精简)列表:

products_selected = ["Apple", "Banana", "Strawberry"]

I produced the following very poor code (condensed) to reach my goal for the column PRODUCT in the Pandas DataFrame:我生成了以下非常糟糕的代码(精简版)来实现 Pandas DataFrame 中 PRODUCT 列的目标:

if len(products_selected) == 1:
    data = data.loc[(data.PRODUCT == products_selected[0])]
   
elif len(products_selected) == 2:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1])]

elif len(products_selected) == 3:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1]) | (data.PRODUCT == products_selected[2])]

How can I do this the pythonic way?我怎样才能以 pythonic 方式做到这一点?

And what's more - independent to the length of the list without having to expand my poor coding manually?更重要的是 - 与列表的长度无关,而不必手动扩展我糟糕的编码?

I can't use reduce() or something like that, it should be done without additional functions except pandas or numpy.我不能使用 reduce() 或类似的东西,除了 pandas 或 numpy 之外,它应该在没有其他功能的情况下完成。

It looks so easy but due to my limited coding experience I didn't manage it.它看起来很简单,但由于我有限的编码经验,我没有做到。

Slicing may help.切片可能会有所帮助。

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels

[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})

products_selected = ["Apple"]
print(data[:len(products_selected)])

products_selected = ["Apple", "Banana"]
print(data[:len(products_selected)])

products_selected = ["Apple", "Banana", "Strawberry"]
print(data[:len(products_selected)])

[out]:
  PRODUCT
0   Apple
  PRODUCT
0   Apple
1  Banana
      PRODUCT
0       Apple
1      Banana
2  Strawberry

... but it makes no assumption about returning the same values from data that are in products_selected . ...但它没有假设从products_selected中的data返回相同的值。

For example:例如:

[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})

products_selected = ["Apple", "Strawberry"]
data[:len(products_selected)]

[out]:
    PRODUCT
0   Apple
1   Banana

There is also a reference in your example to countries_selected .您的示例中还引用了countries_selected I made the assumption that it was a typo and should have also been products_selected .我假设这是一个错字,也应该是products_selected

With some long try and error, I finally got another solution, working for me, too.经过长时间的尝试和错误,我终于找到了另一个解决方案,也适用于我。 I want to post it, maybe other beginners are also interested in.我想贴出来,说不定其他初学者也有兴趣。

It's as easy as this:就这么简单:

data = data.loc[data["PRODUCT"].isin(products_selected)]

Maybe Pandas .isin can help others, too.也许 Pandas .isin也可以帮助其他人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM