簡體   English   中英

堆疊條形圖繪制數據框組

[英]Stacked bar plotting dataframe groups

我試圖從數據框中繪制堆積條形圖幾個小時。 如果這是一個空洞的問題,我很抱歉,但我無法讓它發揮作用,我需要幫助。

我的數據框如下所示:

                                 _id        date                              news_source
0   2715eeada6726024df20e6938ef09f64  2019-12-23                    airport-suppliers.com
1   d068a3d0b24d2a348ff8c8a856aba86c  2019-12-23                    airport-suppliers.com
17  552d7bb9f7d3fd689dd308dc7650baac  2019-12-23                    airport-suppliers.com
20  82be33a041204fd008ba5093607310f6  2019-12-23                    airport-suppliers.com
21  4044907f5b6d5610ec59a03c75e0554c  2019-12-23  airportsinternational.keypublishing.com
22  db4e1e4d1246abc3304e5d77688424dc  2019-12-23  airportsinternational.keypublishing.com
23  b7f57b63218190d249d19624bbdcb520  2019-12-23           internationalairportreview.com
27  84d5377bd8755a685100e408140c4ab1  2019-12-23           internationalairportreview.com
28  8289a1c1b3fa3f618c332d61023eae00  2019-12-16               passengerterminaltoday.com
29  f4f020f09ee5f95499a26c43cfd82d2d  2019-12-16  airportsinternational.keypublishing.com
..                               ...         ...                                      ...
59  a18388a1c77889bdbe6aaa9238a8d21a  2019-12-16                    airport-suppliers.com
62  5cd894a9fa587ab4267adfd23f01e1c4  2019-12-16  airportsinternational.keypublishing.com
66  bb7d05d61f999b1f0b317d21c6c23c0c  2019-12-16  airportsinternational.keypublishing.com
70  f49b9ce330198aec666cb90275d293b2  2019-12-16           internationalairportreview.com
71  af893db09fad9335413ce5c325ced712  2019-12-16               passengerterminaltoday.com
72  e21dc60cfda457b03a6dba6ab44aa3b1  2019-12-16               passengerterminaltoday.com
81  963760af4b4653d175902f4d6285ff0a  2019-12-16               passengerterminaltoday.com
82  778b572be28fd25f394cfa41bbc5aa4a  2019-12-16                    airport-suppliers.com

最后的情節我想展現的是像這樣,但不是策略會有每周日, news_source ,而不是產品,以及數是一樣的。

我嘗試的是 groupby by datenews_source ,然后計算它們。 然后我的其他作品剛剛被搞砸了,最后我不能使其像在示例的格式得到這個 此外,唯一的 news_source, date 的數量可能會隨着時間的推移而改變,所以我盡可能避免硬編碼。

分組:

groups = df.groupby(['date', 'news_source'])["_id"].count()

如果您需要它們作為字典:

counts = defaultdict(dict)
for index, count in zip(groups.index, groups):
    try:
        counts[index[0]][index[1]] += count
    except KeyError:
        counts[index[0]][index[1]] = count

輸出是:

{'2019-12-16': {'airport-suppliers.com': 9,
                'airportsinternational.keypublishing.com': 12,
                'internationalairportreview.com': 19,
                'passengerterminaltoday.com': 21},
 '2019-12-23': {'airport-suppliers.com': 21,
                'airportsinternational.keypublishing.com': 2,
                'internationalairportreview.com': 5}}

如果你知道如何正確地做到這一點,任何幫助將不勝感激,謝謝。

這是生成最小可重現示例的代碼:

import pandas as pd

dates = ['2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-23', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16', '2019-12-16']

sources = ['airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'passengerterminaltoday.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'airport-suppliers.com', 'passengerterminaltoday.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'airport-suppliers.com', 'passengerterminaltoday.com', 'airport-suppliers.com', 'airport-suppliers.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airport-suppliers.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'airportsinternational.keypublishing.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'passengerterminaltoday.com', 'airport-suppliers.com', 'airport-suppliers.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com', 'internationalairportreview.com']

df = pd.DataFrame({"date": dates, "news_source": sources})  

這個怎么樣? 我為您的數據添加了計數:

df1 = df.groupby(['date', 'news_source']).size().reset_index().rename(columns={0:'count'})

然后,我使用pd.crosstab ,設置以下索引、列和值參數。 然后包含一個 aggfunc,在本例中為 sum()。

pd.crosstab(index=df1['date'], columns=df1['news_source'], values=df1['count'], aggfunc=sum).plot.bar(stacked=True)

結果:

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM