從 dataframe 和 plot 圖中取前 5 個值

Question

我有對象列表，我創建了 Dataframe，按 usage_start_date 分組object並繪制堆疊圖。 我想知道通過檢查總成本可以通過哪種方式提取前 5 項昂貴的服務。 因此，在某一天我可以提供 10 項服務，但我想在圖表 5 中顯示最昂貴的服務。 這是當前代碼：

dates = ["2022-02-13T13:43:22+00:00", "2022-02-14T13:43:22+00:001", "2022-02-15T13:43:22+00:00", "2022-02-16T13:43:22+00:00"]

service_name = ["Example service 1", "Example service 2", "Example service 3", "Example service 4", "Example service 5", "Example service 6", "Example service 7", "Example service 8", "Example service 9", 'Example 10']
data = []
for i in range(0,50):
    tmp_data = {
        "usage_start_date": random.choice(dates),
        "cost": random.randrange(100),
        "service_name": random.choice(service_name)
        }
    data.append(tmp_data)
df = pd.DataFrame(data)
df['usage_start_date'] = pd.to_datetime(df['usage_start_date'], utc=True).dt.tz_convert(None).dt.date

df['usage_start_date'] = pd.to_datetime(df['usage_start_date'], format='%Y-%m-%d')
grouped_data = df.groupby(['usage_start_date', 'service_name'], as_index=False, group_keys=True).sum() #.nlargest(n=5, columns=['cost'])

df1 = grouped_data.sort_values(by=['usage_start_date', 'cost'], ascending=[False, False]) #.nlargest(n=5, columns=['cost'])

df1.pivot(index="usage_start_date", columns="service_name", values="cost").plot(kind="bar", stacked=True, width=0.2)
plt.legend(title="Service names")
plt.show()

我試圖添加 nlargest(n=5, columns['cost']) 但它不起作用。

Answer 1

您可以將繪圖線替換為

df1.sort_values('cost', ascending = False).groupby('usage_start_date').head(5).pivot(index="usage_start_date", columns="service_name", values="cost").plot(kind="bar", stacked=True, width=0.2)

我們首先按usage_start_date分組並采用5個最大成本服務

每個日期最昂貴的 5 項服務在不同日期是不同的，我沒有重新標記它們，所以圖例提到了所有 10 項服務。 但是每個日期只有 5 個點圖表看起來像這樣

從 dataframe 和 plot 圖中取前 5 個值

問題描述

1 個解決方案

解決方案1
1 已采納 2022-02-02 21:54:29

從 dataframe 和 plot 圖中取前 5 個值

問題描述

1 個解決方案

解決方案1 1 已采納 2022-02-02 21:54:29

解決方案1
1 已采納 2022-02-02 21:54:29