简体   繁体   English

如何使用 boolean 索引索引 pandas dataframe?

[英]How do I index an pandas dataframe using boolean indexing?

I am starting a new practice module in pandas where we deal with indexing and filtering of data.我正在 pandas 中开始一个新的练习模块,我们在其中处理数据的索引和过滤。 I have come across a format of method chaining that was not explained in the course and I was wondering if anyone could help me make sense of this.我遇到了一种在课程中没有解释的方法链接格式,我想知道是否有人可以帮助我理解这一点。 The dataset is from the fortune 500 company listings.该数据集来自财富 500 强公司列表。

df = pd.read_csv('f500.csv', index_col = 0)

The issue is that we have been taught to use boolean indexing by passing the bool condition to the dataframe like so;问题是我们被教导通过将布尔条件传递给 dataframe 来使用 boolean 索引,就像这样;

motor_bool = df["industry"] == "Motor Vehicles and Parts"
motor_countries = df.loc[motor_bool, "country"]

The above code was to find the countries that have "Motor Vehicles and Parts" as their industries.上面的代码是查找以“汽车及零部件”为行业的国家。 The last exercise in the module asks us to模块中的最后一个练习要求我们

" Create a series, industry_usa, containing counts of the two most common values in the industry column for companies headquartered in the USA." “创建一个系列,industry_usa,其中包含总部位于美国的公司的行业列中两个最常见值的计数。”

And the answer code is答案代码是

industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)

I don't understand how we can suddenly use df[col]df[col] back to back?我不明白我们怎么能突然背靠背使用 df[col]df[col] ? Am I not supposed pass the bool condition first then specify which column i want to assign it to using.loc?我不应该先通过布尔条件然后指定我想将哪个列分配给 using.loc 吗? The method chaining the used is very different to what we have practiced.链接使用的方法与我们实践的方法非常不同。

Please help.请帮忙。 I am truly confused.我真的很困惑。

As always, thanks you, stack community.一如既往,谢谢你,堆栈社区。

I think last solution is not recommended , here better is use DataFrame.loc like second solution for get column industry by mask and then get counts:我认为不推荐最后一个解决方案,这里更好的是使用DataFrame.loc像第二个解决方案一样通过掩码获取列industry然后获取计数:

industry_usa = f500.loc[f500["country"] == "USA", "industry"].value_counts().head(2)

Another solution with Series.nlargest : Series.nlargest的另一个解决方案:

industry_usa = f500.loc[f500["country"] == "USA", "industry"].nlargest(2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用列表中的值替换熊猫数据框中的notnull值/如何获取notnull值的索引/布尔索引的实现 - Replace notnull values in pandas dataframe with values from a list / how to get the index of a notnull value / implementation of boolean indexing 使用系列索引为熊猫数据框建立索引 - Indexing a Pandas Dataframe using the index of a Series 熊猫数据框索引在索引数据框的子集时引起问题。 如何删除索引,或防止发生错误? - Pandas dataframe index causing problems when indexing subset of dataframe. How do I remove the indexes, or prevent the error from occurring? 索引的布尔索引(而不是数据框) - boolean indexing on index (instead of dataframe) 如何使用 pandas 中的日期时间索引列表索引 dataframe? - How do I index a dataframe using a list of datetime indices in pandas? pandas:Boolean 多索引索引 - pandas: Boolean indexing with multi index 我如何使用布尔值索引来检索熊猫DataFrame的列 - How could I use boolean index to retrieve columns of a pandas DataFrame 如何用索引列表索引熊猫数据框? - how to indexing pandas dataframe with index list? Pandas.iloc 索引加上 Dataframe 中的 boolean 索引 - Pandas .iloc indexing coupled with boolean indexing in a Dataframe 如何在Pandas MultiIndexed DataFrame上做部分索引? - How can I do partial indexing on Pandas MultiIndexed DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM