简体   繁体   中英

How can I specify the discrete values that I want to plot on the x-axis (matplotlib, boxplot)?

I'm using boxplot in matplotlib (Python) to create box plots, I'm creating many graphs with different dates. On the x axis the data is discrete.

The values on the x axis in seconds are 0.25, 0.5, 1, 2, 5 .... 28800. These values were arbitrarily chosen (they are sampling periods). On some graphs one or two values are missing because the data wasn't available. On these graphs the x axis resizes itself to spread out the other values.

I would like all the graphs to have the same values at the same place on the x axis (it doesn't matter if the x axis shows a value but there is no data plotted on the graph)

Could someone tell me if there is a way to specify the x axis values? Or another way to keep the same values in the same place.

The relevant section of code is as follows:


for i, group in myDataframe.groupby("Date"):

    graphFilename = (basename+'_' + str(i) + '.png')
    plt.figure(graphFilename)
    group.boxplot(by=["SamplePeriod_seconds"], sym='g+') ## colour = 'blue'
    plt.grid(True)
    axes = plt.gca()
    axes.set_ylim([0,30000])
    plt.ylabel('Average distance (m)', fontsize =8)
    plt.xlabel('GPS sample interval (s)', fontsize=8)
    plt.tick_params(axis='x', which='major', labelsize=8)
    plt.tick_params(axis='y', which='major', labelsize=8)
    plt.xticks(rotation=90)
    plt.title(str(i) + ' - ' + 'Average distance travelled by cattle over 24  hour period', fontsize=9) 
    plt.suptitle('')
    plt.savefig(graphFilename)
    plt.close()     

Any help appreciated, I will continue googling... .thanks :)

if you try somehting like this:

plt.xticks(np.arange(x.min(), x.max(), 5))

where x is your array of x values, and 5 the steps you take along the axis.

Same applies for the y axis with yticks. Hope it helps! :)

EDIT:

I have removed the instances that i did not have, but the following code should give you a grid to plot onto:

import matplotlib.pyplot as plt
import numpy as np


plt.grid(True)
axes = plt.gca()
axes.set_ylim([0, 30000])
plt.ylabel('Average distance (m)', fontsize=8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle('')
my_xticks =[0.25,0.5,1,2,5,10,20,30,60,120,300,600,1200,1800,2400,3‌000,3600,7200,10800,‌​ 14400,18000,21600,25‌​200,28800]
x = np.array(np.arange(0, len(my_xticks), 1))

plt.xticks(x, my_ticks)
plt.show()

Try plugging in your values on top of this :)

By default, boxplot simply plots the available data to successive positions on the axes. Missing data are left out, simply because the boxplot doesn't know they are missing. However, the positions of the boxes can be set manually using the positions argument. The following example does this and thereby produces plots of equal extents even when values are missing.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


basename = __file__+"_plot"
Nd = 4 # four different dates
Ns = 5 # five second intervals
N = 80 # each 80 values
date = []
seconds = []
avgdist = []
# fill lists
for i in range(Nd):
    # for each date, select a random SamplePeriod to be not part of the dataframe
    w = np.random.randint(0,5)
    for j in range(Ns):
        if j!=w:
            av = np.random.poisson(1.36+j/10., N)*4000+1000
            avgdist.append(av) 
            seconds.append([j]*N)
            date.append([i]*N)

date = np.array(date).flatten()
seconds = np.array(seconds).flatten()
avgdist = np.array(avgdist).flatten()
#put data into DataFrame
myDataframe = pd.DataFrame({"Date" : date, "SamplePeriod_seconds" : seconds, "avgdist" : avgdist}) 
# obtain a list of all possible Sampleperiods
globalunique = np.sort(myDataframe["SamplePeriod_seconds"].unique())

for i, group in myDataframe.groupby("Date"):

    graphFilename = (basename+'_' + str(i) + '.png')
    fig = plt.figure(graphFilename, figsize=(6,3))
    axes = fig.add_subplot(111)
    plt.grid(True)

    # omit the `dates` column
    dfgroup = group[["SamplePeriod_seconds", "avgdist"]]
    # obtain a list of Sampleperiods for this date
    unique = np.sort(dfgroup["SamplePeriod_seconds"].unique())
    # plot the boxes to the axes, one for each sample periods in dfgroup
    # set the boxes' positions to the values in unique
    dfgroup.boxplot(by=["SamplePeriod_seconds"], sym='g+', positions=unique, ax=axes)

    # set xticks to the unique positions, where boxes are
    axes.set_xticks(unique)
    # make sure all plots share the same extent.
    axes.set_xlim([-0.5,globalunique[-1]+0.5])
    axes.set_ylim([0,30000])

    plt.ylabel('Average distance (m)', fontsize =8)
    plt.xlabel('GPS sample interval (s)', fontsize=8)
    plt.tick_params(axis='x', which='major', labelsize=8)
    plt.tick_params(axis='y', which='major', labelsize=8)
    plt.xticks(rotation=90)
    plt.suptitle(str(i) + ' - ' + 'Average distance travelled by cattle over 24  hour period', fontsize=9) 
    plt.title("")
    plt.savefig(graphFilename)
    plt.close()    

在此处输入图片说明
在此处输入图片说明

This will still work, if the values in the SamplePeriod_seconds columnare non-equally spaced, but of course if they are extremely different, this will not produce nice results, because the bars will overlapp:

在此处输入图片说明

This however is not a problem with plotting itself. And for further help, one would need to know how you expect the graph to look like at the end.

Thank you everyone very much for the help, using your answers I got it working with the following code. (I realize it can probably be improved but happy that it works I can look at the data now :) )

valuesShouldPlot = ['0.25','0.5','1.0','2.0','5.0','10.0','20.0','30.0','60.0','120.0','300.0','600.0','1200.0','1800.0','2400.0','3000.0','3600.0','7200.0','10800.0','14400.0','18000.0','21600.0','25200.0','28800.0']       


for xDate, group in myDataframe.groupby("Date"):            ## for each date

    graphFilename = (basename+'_' + str(xDate) + '.png')    ## make up a suitable filename for the graph

    plt.figure(graphFilename)

    group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both')  ## create box plot, (boxplots are placed in default positions)

    ## get information on where the boxplots were placed by looking at the values on the x-axis                                                    
    axes = plt.gca()  
    checkXticks= axes.get_xticks()
    numOfValuesPlotted =len(checkXticks)            ## check how many boxplots were actually plotted by counting the labels printed on the x-axis
    lengthValuesShouldPlot = len(valuesShouldPlot)  ## (check how many boxplots should have been created if no data was missing)



    if (numOfValuesPlotted < valuesShouldPlot): ## if number of values actually plotted is less than the maximum possible it means some values are missing
                                                ## if that occurs then want to move the plots across accordingly to leave gaps where the missing values should go


        labels = [item.get_text() for item in axes.get_xticklabels()]

        i=0                 ## counter to increment through the entire list of x values that should exist if no data was missing.
        j=0                 ## counter to increment through the list of x labels that were originally plotted (some labels may be missing, want to check what's missing)

        positionOfBoxesList =[] ## create a list which will eventually contain the positions on the x-axis where boxplots should be drawn  

        while ( j < numOfValuesPlotted): ## look at each value in turn in the list of x-axis labels (on the graph plotted earlier)

            if (labels[j] == valuesShouldPlot[i]):  ## if the value on the x axis matches the value in the list of 'valuesShouldPlot' 
                positionOfBoxesList.append(i)       ## then record that position as a suitable position to put a boxplot
                j = j+1
                i = i+1


            else :                                  ## if they don't match (there must be a value missing) skip the value and look at the next one             

                print("\n******** missing value ************")
                print("Date:"),
                print(xDate),
                print(", Position:"),
                print(i),
                print(":"),
                print(valuesShouldPlot[i])
                i=i+1               


        plt.close()     ## close the original plot (the one that didn't leave gaps for missing data)
        group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both', positions=positionOfBoxesList) ## replot with boxes in correct positions

    ## format graph to make it look better        
    plt.ylabel('Average distance (m)', fontsize =8)
    plt.xlabel('GPS sample interval (s)', fontsize=8)
    plt.tick_params(axis='x', which='major', labelsize=8)
    plt.tick_params(axis='y', which='major', labelsize=8)
    plt.xticks(rotation=90)   
    plt.title(str(xDate) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9) ## put the title above the first subplot (ie. at the top of the page)
    plt.suptitle('')
    axes = plt.gca() 
    axes.set_ylim([0,30000])

    ## save and close 
    plt.savefig(graphFilename)  
    plt.close()         

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM