[英]How do I index an pandas dataframe using boolean indexing?
I am starting a new practice module in pandas where we deal with indexing and filtering of data.我正在 pandas 中开始一个新的练习模块,我们在其中处理数据的索引和过滤。 I have come across a format of method chaining that was not explained in the course and I was wondering if anyone could help me make sense of this.
我遇到了一种在课程中没有解释的方法链接格式,我想知道是否有人可以帮助我理解这一点。 The dataset is from the fortune 500 company listings.
该数据集来自财富 500 强公司列表。
df = pd.read_csv('f500.csv', index_col = 0)
The issue is that we have been taught to use boolean indexing by passing the bool condition to the dataframe like so;问题是我们被教导通过将布尔条件传递给 dataframe 来使用 boolean 索引,就像这样;
motor_bool = df["industry"] == "Motor Vehicles and Parts"
motor_countries = df.loc[motor_bool, "country"]
The above code was to find the countries that have "Motor Vehicles and Parts" as their industries.上面的代码是查找以“汽车及零部件”为行业的国家。 The last exercise in the module asks us to
模块中的最后一个练习要求我们
" Create a series, industry_usa, containing counts of the two most common values in the industry column for companies headquartered in the USA." “创建一个系列,industry_usa,其中包含总部位于美国的公司的行业列中两个最常见值的计数。”
And the answer code is答案代码是
industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)
I don't understand how we can suddenly use df[col]df[col] back to back?我不明白我们怎么能突然背靠背使用 df[col]df[col] ? Am I not supposed pass the bool condition first then specify which column i want to assign it to using.loc?
我不应该先通过布尔条件然后指定我想将哪个列分配给 using.loc 吗? The method chaining the used is very different to what we have practiced.
链接使用的方法与我们实践的方法非常不同。
Please help.请帮忙。 I am truly confused.
我真的很困惑。
As always, thanks you, stack community.一如既往,谢谢你,堆栈社区。
I think last solution is not recommended , here better is use DataFrame.loc
like second solution for get column industry
by mask and then get counts:我认为不推荐最后一个解决方案,这里更好的是使用
DataFrame.loc
像第二个解决方案一样通过掩码获取列industry
然后获取计数:
industry_usa = f500.loc[f500["country"] == "USA", "industry"].value_counts().head(2)
Another solution with Series.nlargest
: Series.nlargest
的另一个解决方案:
industry_usa = f500.loc[f500["country"] == "USA", "industry"].nlargest(2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.