[英]Pandas - count groupBy results
I am trying to get a group of the most popular names by country using pandas. 我正在尝试使用熊猫按国家列出一组最受欢迎的名称。 I have gotten half way as seen in the snippet but I am unclear how to convert groupedByCountry into a sorted table.
如片段所示,我已经完成一半,但是我不清楚如何将groupedByCountry转换为排序表。
import math
import pandas
csv = pandas.read_csv("./name_country.csv.gz", compression="gzip")
data = csv[["name",'country']]
filtered = roleIni[data.country.notnull()]
groupedByCountry = filtered.groupby("country")
You can use groupby size
and then use nlargest
: 您可以使用groupby
size
,然后使用nlargest
:
In [11]: df = pd.DataFrame([["andy", "GB"], ["bob", "US"], ["chris", "GB"]], columns=["name", "country"])
In [12]: df.groupby("country").size().nlargest(1)
Out[12]:
country
GB 2
dtype: int64
It's probably more efficient however to do a direct value_counts
on the column, and then take the head
( head(n)
will get the top n most popular countries): 但是,在列上直接进行
value_counts
,然后采用head
( head(n)
将获得最受欢迎的前n个国家/地区)可能会更有效:
In [21]: df["country"].value_counts().head(1)
Out[21]:
GB 2
Name: country, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.