简体   繁体   English

如何计算 groupby 列的百分比并按降序排序?

[英]How can I calculate percentage of a groupby column and sort it by descending order?

Question: How can I calculate percentage of a groupby column and sort it by descending order?问题:如何计算 groupby 列的百分比并按降序排序?

Desired output:所需的 output:

country            count     percentage
United States      2555        45%
India               923        12%
United Kingdom      397        4%
Japan               226        3%
South Korea         183        2% 

I did some research, looked at the Pandas Documentation, looked at other questions here on Stackoverflow without luck.我做了一些研究,查看了 Pandas 文档,在 Stackoverflow 上查看了其他问题,但没有运气。

I tried the following:我尝试了以下方法:

#1 Try: #1 尝试:

Df2 = df.groupby('country')['show_id'].count().nlargest()
df3 = df2.groupby(level=0).apply(lambda x: x/x.sum() * 100)

Output: Output:

director
A. L. Vijay            100.0
A. Raajdheep           100.0
A. Salaam              100.0
A.R. Murugadoss        100.0
Aadish Keluskar        100.0
...
Çagan Irmak            100.0
Ísold Uggadóttir       100.0
Óskar Thór Axelsson    100.0
Ömer Faruk Sorak       100.0
Şenol Sönmez           100.0

Name: show_id, Length: 4049, dtype: float64

#2 Try: #2 尝试:

df2 = df.groupby('country')['show_id'].count()
df2['percentage'] = df2['show_id']/6000

Output: Output:

KeyError: 'show_id'

Sample of the dataset:数据集样本:

import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'], 
'country':['United States, India, South Korea, China',
'United Kingdom','United States'], 
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})

This doesn't address rows where there are multiple countries in the "country" field, but the lines below should work for the other parts of the question:这并没有解决“国家”字段中有多个国家的行,但下面的行应该适用于问题的其他部分:

Create initial dataframe:创建初始 dataframe:

df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',0,0],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'], 
'country':['United States, India, South Korea, China',
'United Kingdom','United States'], 
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})

Groupby country:分组国家:

df2 = df.groupby(by="country", as_index=False)['show_id']\
    .agg('count')

Rename agg column:重命名 agg 列:

df2 = df2.rename(columns={'show_id':'count'})

Create percentage column:创建百分比列:

df2['percent'] = (df2['count']*100)/df2['count'].sum()

Sort descending:降序排序:

df2 = df2.sort_values(by='percent', ascending=False)

Part of the issue in your Attempt #1 may have been that you didn't include the "by" parameter in your groupby function.您的尝试 #1 中的部分问题可能是您没有在 groupby function 中包含“by”参数。

    newDF = pd.DataFrame(DF.Country.value_counts())
    newDF['percentage'] = round(pd.DataFrame(DF.Country.value_counts(normalize =  \
         True).mul(100)),2)
    newDF.columns = ['count', 'percentage']

    newDF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何按升序或降序对 matplotlib 图进行排序? - How can i sort the matplotlib graph in ascending or descending order? 我如何按时间降序对python字典进行排序 - How can i sort python dictionary by time in descending order 我如何按 seaborn 的升序/降序对直方图条进行排序? - How i can sort histogram bar in ascending/descending order of seaborn? 通过按层次结构顺序递减百分比对元组列表进行排序 - Sort a list of tuples by descending percentage with hierarchical order 如何按列分组然后计算列的百分比 - How can I group by a column then calculate a percentage of a column Python GroupBy sort 按分组内的列降序 - Python GroupBy sort Descending by column within grouping 如何使用pandas groupby计算每列中的总数百分比 - How to use pandas groupby to calculate percentage of total in each column 如何获取 groupby 总数,然后计算 Pandas DataFrame 列的百分比 - How to get groupby total and then calculate percentage of a Pandas DataFrame column 如何分组并计算熊猫每列中不丢失值的百分比? - how to groupby and calculate the percentage of non missing values in each column in pandas? 使用冒泡排序,我如何使用第二个元素以便我可以按降序打印 - Using Bubble Sort, how can I use the second element so I can print in descending order
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM