[英]Please explain the following line of code for me. i.e pandas series creation using 2 columns of a dataframe
industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)
This is a dataframe where some of its columns are industry
and country
.这是一个数据框,其中一些列是
industry
和country
。 So why do we need to locate the 2 columns side by side while creating the indsutry_usa
series.那么为什么我们需要在创建
indsutry_usa
系列时并排放置 2 列。 Please explain.请解释。
I will break it down for you:我给你分解一下:
f500["industry"]
: This selects the series (column) with the same name. f500["industry"]
:选择同名的系列(列)。
f500["country"] == "USA"
: This returns a boolean index containing True
for all the rows which have their country column as USA. f500["country"] == "USA"
:这将返回一个布尔索引,其中包含所有国家列为美国的行的True
。
f500["industry"][f500["country"] == "USA"]
: As you might have guessed, this now is just like any other indexing we do in pandas. f500["industry"][f500["country"] == "USA"]
:正如您可能已经猜到的,这就像我们在 Pandas 中所做的任何其他索引一样。 So, it selects all those " industry "s where the country is "USA".因此,它选择了国家为“美国”的所有“行业”。
.value_counts()
: is just to do a count of the unique values. .value_counts()
:只是对唯一值进行计数。 Like we have in Counter
class in python
就像我们在
python
Counter
类中一样
NOTE: The interesting fact is that you could change the order to - f500[f500["country"] == "USA"]["industry"]
and still get the same result!!注意:有趣的事实是,您可以将顺序更改为 -
f500[f500["country"] == "USA"]["industry"]
并且仍然得到相同的结果!!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.