![](/img/trans.png)
[英]How can i sort the matplotlib graph in ascending or descending order?
[英]How can I calculate percentage of a groupby column and sort it by descending order?
問題:如何計算 groupby 列的百分比並按降序排序?
所需的 output:
country count percentage
United States 2555 45%
India 923 12%
United Kingdom 397 4%
Japan 226 3%
South Korea 183 2%
我做了一些研究,查看了 Pandas 文檔,在 Stackoverflow 上查看了其他問題,但沒有運氣。
我嘗試了以下方法:
#1 嘗試:
Df2 = df.groupby('country')['show_id'].count().nlargest()
df3 = df2.groupby(level=0).apply(lambda x: x/x.sum() * 100)
Output:
director
A. L. Vijay 100.0
A. Raajdheep 100.0
A. Salaam 100.0
A.R. Murugadoss 100.0
Aadish Keluskar 100.0
...
Çagan Irmak 100.0
Ísold Uggadóttir 100.0
Óskar Thór Axelsson 100.0
Ömer Faruk Sorak 100.0
Şenol Sönmez 100.0
Name: show_id, Length: 4049, dtype: float64
#2 嘗試:
df2 = df.groupby('country')['show_id'].count()
df2['percentage'] = df2['show_id']/6000
Output:
KeyError: 'show_id'
數據集樣本:
import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
這並沒有解決“國家”字段中有多個國家的行,但下面的行應該適用於問題的其他部分:
創建初始 dataframe:
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',0,0],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
分組國家:
df2 = df.groupby(by="country", as_index=False)['show_id']\
.agg('count')
重命名 agg 列:
df2 = df2.rename(columns={'show_id':'count'})
創建百分比列:
df2['percent'] = (df2['count']*100)/df2['count'].sum()
降序排序:
df2 = df2.sort_values(by='percent', ascending=False)
您的嘗試 #1 中的部分問題可能是您沒有在 groupby function 中包含“by”參數。
newDF = pd.DataFrame(DF.Country.value_counts())
newDF['percentage'] = round(pd.DataFrame(DF.Country.value_counts(normalize = \
True).mul(100)),2)
newDF.columns = ['count', 'percentage']
newDF
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.