I'm using boxplot in matplotlib (Python) to create box plots, I'm creating many graphs with different dates. On the x axis the data is discrete.
The values on the x axis in seconds are 0.25, 0.5, 1, 2, 5 .... 28800. These values were arbitrarily chosen (they are sampling periods). On some graphs one or two values are missing because the data wasn't available. On these graphs the x axis resizes itself to spread out the other values.
I would like all the graphs to have the same values at the same place on the x axis (it doesn't matter if the x axis shows a value but there is no data plotted on the graph)
Could someone tell me if there is a way to specify the x axis values? Or another way to keep the same values in the same place.
The relevant section of code is as follows:
for i, group in myDataframe.groupby("Date"):
graphFilename = (basename+'_' + str(i) + '.png')
plt.figure(graphFilename)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+') ## colour = 'blue'
plt.grid(True)
axes = plt.gca()
axes.set_ylim([0,30000])
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.title(str(i) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9)
plt.suptitle('')
plt.savefig(graphFilename)
plt.close()
Any help appreciated, I will continue googling... .thanks :)
if you try somehting like this:
plt.xticks(np.arange(x.min(), x.max(), 5))
where x is your array of x values, and 5 the steps you take along the axis.
Same applies for the y axis with yticks. Hope it helps! :)
EDIT:
I have removed the instances that i did not have, but the following code should give you a grid to plot onto:
import matplotlib.pyplot as plt
import numpy as np
plt.grid(True)
axes = plt.gca()
axes.set_ylim([0, 30000])
plt.ylabel('Average distance (m)', fontsize=8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle('')
my_xticks =[0.25,0.5,1,2,5,10,20,30,60,120,300,600,1200,1800,2400,3000,3600,7200,10800, 14400,18000,21600,25200,28800]
x = np.array(np.arange(0, len(my_xticks), 1))
plt.xticks(x, my_ticks)
plt.show()
Try plugging in your values on top of this :)
By default, boxplot
simply plots the available data to successive positions on the axes. Missing data are left out, simply because the boxplot doesn't know they are missing. However, the positions of the boxes can be set manually using the positions
argument. The following example does this and thereby produces plots of equal extents even when values are missing.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
basename = __file__+"_plot"
Nd = 4 # four different dates
Ns = 5 # five second intervals
N = 80 # each 80 values
date = []
seconds = []
avgdist = []
# fill lists
for i in range(Nd):
# for each date, select a random SamplePeriod to be not part of the dataframe
w = np.random.randint(0,5)
for j in range(Ns):
if j!=w:
av = np.random.poisson(1.36+j/10., N)*4000+1000
avgdist.append(av)
seconds.append([j]*N)
date.append([i]*N)
date = np.array(date).flatten()
seconds = np.array(seconds).flatten()
avgdist = np.array(avgdist).flatten()
#put data into DataFrame
myDataframe = pd.DataFrame({"Date" : date, "SamplePeriod_seconds" : seconds, "avgdist" : avgdist})
# obtain a list of all possible Sampleperiods
globalunique = np.sort(myDataframe["SamplePeriod_seconds"].unique())
for i, group in myDataframe.groupby("Date"):
graphFilename = (basename+'_' + str(i) + '.png')
fig = plt.figure(graphFilename, figsize=(6,3))
axes = fig.add_subplot(111)
plt.grid(True)
# omit the `dates` column
dfgroup = group[["SamplePeriod_seconds", "avgdist"]]
# obtain a list of Sampleperiods for this date
unique = np.sort(dfgroup["SamplePeriod_seconds"].unique())
# plot the boxes to the axes, one for each sample periods in dfgroup
# set the boxes' positions to the values in unique
dfgroup.boxplot(by=["SamplePeriod_seconds"], sym='g+', positions=unique, ax=axes)
# set xticks to the unique positions, where boxes are
axes.set_xticks(unique)
# make sure all plots share the same extent.
axes.set_xlim([-0.5,globalunique[-1]+0.5])
axes.set_ylim([0,30000])
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle(str(i) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9)
plt.title("")
plt.savefig(graphFilename)
plt.close()
This will still work, if the values in the SamplePeriod_seconds
columnare non-equally spaced, but of course if they are extremely different, this will not produce nice results, because the bars will overlapp:
This however is not a problem with plotting itself. And for further help, one would need to know how you expect the graph to look like at the end.
Thank you everyone very much for the help, using your answers I got it working with the following code. (I realize it can probably be improved but happy that it works I can look at the data now :) )
valuesShouldPlot = ['0.25','0.5','1.0','2.0','5.0','10.0','20.0','30.0','60.0','120.0','300.0','600.0','1200.0','1800.0','2400.0','3000.0','3600.0','7200.0','10800.0','14400.0','18000.0','21600.0','25200.0','28800.0']
for xDate, group in myDataframe.groupby("Date"): ## for each date
graphFilename = (basename+'_' + str(xDate) + '.png') ## make up a suitable filename for the graph
plt.figure(graphFilename)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both') ## create box plot, (boxplots are placed in default positions)
## get information on where the boxplots were placed by looking at the values on the x-axis
axes = plt.gca()
checkXticks= axes.get_xticks()
numOfValuesPlotted =len(checkXticks) ## check how many boxplots were actually plotted by counting the labels printed on the x-axis
lengthValuesShouldPlot = len(valuesShouldPlot) ## (check how many boxplots should have been created if no data was missing)
if (numOfValuesPlotted < valuesShouldPlot): ## if number of values actually plotted is less than the maximum possible it means some values are missing
## if that occurs then want to move the plots across accordingly to leave gaps where the missing values should go
labels = [item.get_text() for item in axes.get_xticklabels()]
i=0 ## counter to increment through the entire list of x values that should exist if no data was missing.
j=0 ## counter to increment through the list of x labels that were originally plotted (some labels may be missing, want to check what's missing)
positionOfBoxesList =[] ## create a list which will eventually contain the positions on the x-axis where boxplots should be drawn
while ( j < numOfValuesPlotted): ## look at each value in turn in the list of x-axis labels (on the graph plotted earlier)
if (labels[j] == valuesShouldPlot[i]): ## if the value on the x axis matches the value in the list of 'valuesShouldPlot'
positionOfBoxesList.append(i) ## then record that position as a suitable position to put a boxplot
j = j+1
i = i+1
else : ## if they don't match (there must be a value missing) skip the value and look at the next one
print("\n******** missing value ************")
print("Date:"),
print(xDate),
print(", Position:"),
print(i),
print(":"),
print(valuesShouldPlot[i])
i=i+1
plt.close() ## close the original plot (the one that didn't leave gaps for missing data)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both', positions=positionOfBoxesList) ## replot with boxes in correct positions
## format graph to make it look better
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.title(str(xDate) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9) ## put the title above the first subplot (ie. at the top of the page)
plt.suptitle('')
axes = plt.gca()
axes.set_ylim([0,30000])
## save and close
plt.savefig(graphFilename)
plt.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.