[英]Build expression for pandas loc to filter data using a certain column according to length of a list
I am pretty new to python programming and I want to use Pandas.loc to filter data using a certain column according to the length of a products' list.我是 python 编程的新手,我想使用 Pandas.loc 根据产品列表的长度使用特定列过滤数据。
I have the (condensed) list:我有(精简)列表:
products_selected = ["Apple", "Banana", "Strawberry"]
I produced the following very poor code (condensed) to reach my goal for the column PRODUCT in the Pandas DataFrame:我生成了以下非常糟糕的代码(精简版)来实现 Pandas DataFrame 中 PRODUCT 列的目标:
if len(products_selected) == 1:
data = data.loc[(data.PRODUCT == products_selected[0])]
elif len(products_selected) == 2:
data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1])]
elif len(products_selected) == 3:
data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1]) | (data.PRODUCT == products_selected[2])]
How can I do this the pythonic way?我怎样才能以 pythonic 方式做到这一点?
And what's more - independent to the length of the list without having to expand my poor coding manually?更重要的是 - 与列表的长度无关,而不必手动扩展我糟糕的编码?
I can't use reduce() or something like that, it should be done without additional functions except pandas or numpy.我不能使用 reduce() 或类似的东西,除了 pandas 或 numpy 之外,它应该在没有其他功能的情况下完成。
It looks so easy but due to my limited coding experience I didn't manage it.它看起来很简单,但由于我有限的编码经验,我没有做到。
Slicing may help.切片可能会有所帮助。
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels
[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})
products_selected = ["Apple"]
print(data[:len(products_selected)])
products_selected = ["Apple", "Banana"]
print(data[:len(products_selected)])
products_selected = ["Apple", "Banana", "Strawberry"]
print(data[:len(products_selected)])
[out]:
PRODUCT
0 Apple
PRODUCT
0 Apple
1 Banana
PRODUCT
0 Apple
1 Banana
2 Strawberry
... but it makes no assumption about returning the same values from data
that are in products_selected
. ...但它没有假设从
products_selected
中的data
返回相同的值。
For example:例如:
[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})
products_selected = ["Apple", "Strawberry"]
data[:len(products_selected)]
[out]:
PRODUCT
0 Apple
1 Banana
There is also a reference in your example to countries_selected
.您的示例中还引用了
countries_selected
。 I made the assumption that it was a typo and should have also been products_selected
.我假设这是一个错字,也应该是
products_selected
。
With some long try and error, I finally got another solution, working for me, too.经过长时间的尝试和错误,我终于找到了另一个解决方案,也适用于我。 I want to post it, maybe other beginners are also interested in.
我想贴出来,说不定其他初学者也有兴趣。
It's as easy as this:就这么简单:
data = data.loc[data["PRODUCT"].isin(products_selected)]
Maybe Pandas .isin
can help others, too.也许 Pandas
.isin
也可以帮助其他人。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.