简体   繁体   中英

Iteration over years to plot different group values as bar plot in pandas

I have a dataframe that records number of observations at different locations for different years. I am trying to make a barplot where I can show the total number of observations at different locations for different years. For each location, I want the total observations, for different years to be shown in different colors. My approach is to first make location groups and for each location group, calculate total observation. (I don't think I need to change the index to date - as I am grouping by location).I am not able to achieve this using the following code. Help will be much appreciated.

fig, ax = plt.subplots(figsize=(40,15))
date=df['date']
value=df['value']
df.date = pd.to_datetime(df.date)


year_start=2015
year_stop = 2019
#ax=plt.gca()

for year in range(year_start, year_stop+1):
    ax=plt.gca()
    m=df.groupby(['location']).agg({'value': ['count']})


    plt.ylim(0,45000)
    m.plot(kind='bar', legend = False, figsize=(30,15), fontsize = 30)
    #ax.tick_params(axis='both', which='major', labelsize=25)
    plt.ylabel('Number of observations - O3', fontsize = 30, fontweight = 'bold')    

    plt.legend(loc='upper right', prop={'size': 7})
    fig_title='Diurnal_'+place
    plt.savefig(fig_title, format='png',dpi=500, bbox_inches="tight")

    print ('saved=', fig_title)
    plt.show()


The header looks like this:
                             date_utc                       date parameter  \
    212580  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212581  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212582  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212583  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   
    212584  {utc=2020-01-05T05:45:00.000Z  2020-01-05T11:15:00+05:30        o3   

                                               location  value   unit       city  \
    212580        ICRISAT Patancheru, Mumbai - TSPCB   37.7  µg/m³  Hyderabad   
    212581  Bollaram Industrial Area, Surat - TSPCB   39.5  µg/m³  Hyderabad   
    212582          IDA Pashamylaram, Surat - TSPCB   17.8  µg/m³  Hyderabad   
    212583               Sanathnagar, Hyderabad - TSPCB   56.6  µg/m³  Hyderabad   
    212584                  Zoo Park, Hyderabad - TSPCB   24.5  µg/m³  Hyderabad   

Since I was not able to fully reproduce your example, I implemented a toy example from what I understood. Please tell me if I understood something wrong. Here is my code:

import seaborn as sns
import numpy as np
import pandas as pd


df = pd.DataFrame([['Mumbai',2017,10],['Mumbai',2017,12],['Mumbai',2018,20],['Mumbai',2018,23],['Abu Dhabi',2017,30],['Abu Dhabi', 2018,25]], columns =['Place','Year','Amount'])

df_grouped = df.groupby(['Place','Year']).agg({'Amount':'count'}).reset_index()

sns.barplot(x='Place',y='Amount',hue='Year',data= df_grouped)

This code will show a barplot, where each location will reside in x-axis and their total counts in y-axis. Moreover, each unique year will get its own bar in the barplot. Like this:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM