如何使用 boolean 索引索引 pandas dataframe？

Question

I am starting a new practice module in pandas where we deal with indexing and filtering of data.我正在 pandas 中开始一个新的练习模块，我们在其中处理数据的索引和过滤。 I have come across a format of method chaining that was not explained in the course and I was wondering if anyone could help me make sense of this.我遇到了一种在课程中没有解释的方法链接格式，我想知道是否有人可以帮助我理解这一点。 The dataset is from the fortune 500 company listings.该数据集来自财富 500 强公司列表。

df = pd.read_csv('f500.csv', index_col = 0)

The issue is that we have been taught to use boolean indexing by passing the bool condition to the dataframe like so;问题是我们被教导通过将布尔条件传递给 dataframe 来使用 boolean 索引，就像这样；

motor_bool = df["industry"] == "Motor Vehicles and Parts"
motor_countries = df.loc[motor_bool, "country"]

The above code was to find the countries that have "Motor Vehicles and Parts" as their industries.上面的代码是查找以“汽车及零部件”为行业的国家。 The last exercise in the module asks us to模块中的最后一个练习要求我们

" Create a series, industry_usa, containing counts of the two most common values in the industry column for companies headquartered in the USA." “创建一个系列，industry_usa，其中包含总部位于美国的公司的行业列中两个最常见值的计数。”

And the answer code is答案代码是

industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)

I don't understand how we can suddenly use df[col]df[col] back to back?我不明白我们怎么能突然背靠背使用 df[col]df[col] ？ Am I not supposed pass the bool condition first then specify which column i want to assign it to using.loc?我不应该先通过布尔条件然后指定我想将哪个列分配给 using.loc 吗？ The method chaining the used is very different to what we have practiced.链接使用的方法与我们实践的方法非常不同。

Please help.请帮忙。 I am truly confused.我真的很困惑。

As always, thanks you, stack community.一如既往，谢谢你，堆栈社区。

Answer 1

I think last solution is not recommended , here better is use DataFrame.loc like second solution for get column industry by mask and then get counts:我认为不推荐最后一个解决方案，这里更好的是使用DataFrame.loc像第二个解决方案一样通过掩码获取列industry然后获取计数：

industry_usa = f500.loc[f500["country"] == "USA", "industry"].value_counts().head(2)

Another solution with Series.nlargest : Series.nlargest的另一个解决方案：

industry_usa = f500.loc[f500["country"] == "USA", "industry"].nlargest(2)

如何使用 boolean 索引索引 pandas dataframe？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-04 10:13:20

如何使用 boolean 索引索引 pandas dataframe？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-04 10:13:20

解决方案1
2 已采纳 2020-05-04 10:13:20