[英]How can I specify the discrete values that I want to plot on the x-axis (matplotlib, boxplot)?
I'm using boxplot in matplotlib (Python) to create box plots, I'm creating many graphs with different dates. 我在matplotlib(Python)中使用boxplot创建箱形图,正在创建许多具有不同日期的图。 On the x axis the data is discrete.
在x轴上,数据是离散的。
The values on the x axis in seconds are 0.25, 0.5, 1, 2, 5 .... 28800. These values were arbitrarily chosen (they are sampling periods). x轴上以秒为单位的值是0.25、0.5、1、2、5 ....28800。这些值是任意选择的(它们是采样周期)。 On some graphs one or two values are missing because the data wasn't available.
在某些图形上,缺少一两个值,因为数据不可用。 On these graphs the x axis resizes itself to spread out the other values.
在这些图上,x轴会自动调整大小以分散其他值。
I would like all the graphs to have the same values at the same place on the x axis (it doesn't matter if the x axis shows a value but there is no data plotted on the graph) 我希望所有图形在x轴上的相同位置具有相同的值(x轴是否显示值但图形上没有数据无关紧要)
Could someone tell me if there is a way to specify the x axis values? 有人可以告诉我是否可以指定x轴值吗? Or another way to keep the same values in the same place.
或将相同值保留在同一位置的另一种方法。
The relevant section of code is as follows: 代码的相关部分如下:
for i, group in myDataframe.groupby("Date"): 对于我,在myDataframe.groupby(“ Date”)中进行分组:
graphFilename = (basename+'_' + str(i) + '.png')
plt.figure(graphFilename)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+') ## colour = 'blue'
plt.grid(True)
axes = plt.gca()
axes.set_ylim([0,30000])
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.title(str(i) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9)
plt.suptitle('')
plt.savefig(graphFilename)
plt.close()
Any help appreciated, I will continue googling... .thanks :) 任何帮助表示赞赏,我将继续使用谷歌搜索...。谢谢:)
if you try somehting like this: 如果尝试这样的操作:
plt.xticks(np.arange(x.min(), x.max(), 5))
where x is your array of x values, and 5 the steps you take along the axis. 其中x是x值的数组,而5是沿轴执行的步骤。
Same applies for the y axis with yticks. 带有yticks的y轴也是如此。 Hope it helps!
希望能帮助到你! :)
:)
EDIT: 编辑:
I have removed the instances that i did not have, but the following code should give you a grid to plot onto: 我删除了我没有的实例,但是以下代码应为您提供一个绘制网格的网格:
import matplotlib.pyplot as plt
import numpy as np
plt.grid(True)
axes = plt.gca()
axes.set_ylim([0, 30000])
plt.ylabel('Average distance (m)', fontsize=8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle('')
my_xticks =[0.25,0.5,1,2,5,10,20,30,60,120,300,600,1200,1800,2400,3000,3600,7200,10800, 14400,18000,21600,25200,28800]
x = np.array(np.arange(0, len(my_xticks), 1))
plt.xticks(x, my_ticks)
plt.show()
Try plugging in your values on top of this :) 尝试在此基础上插入值:)
By default, boxplot
simply plots the available data to successive positions on the axes. 默认情况下,
boxplot
仅将可用数据绘制到轴上的连续位置。 Missing data are left out, simply because the boxplot doesn't know they are missing. 丢失数据被遗漏了,仅仅是因为箱线图不知道它们丢失了。 However, the positions of the boxes can be set manually using the
positions
argument. 但是,可以使用
positions
参数手动设置框的positions
。 The following example does this and thereby produces plots of equal extents even when values are missing. 下面的示例将执行此操作,从而即使丢失值也可以生成相等范围的图。
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
basename = __file__+"_plot"
Nd = 4 # four different dates
Ns = 5 # five second intervals
N = 80 # each 80 values
date = []
seconds = []
avgdist = []
# fill lists
for i in range(Nd):
# for each date, select a random SamplePeriod to be not part of the dataframe
w = np.random.randint(0,5)
for j in range(Ns):
if j!=w:
av = np.random.poisson(1.36+j/10., N)*4000+1000
avgdist.append(av)
seconds.append([j]*N)
date.append([i]*N)
date = np.array(date).flatten()
seconds = np.array(seconds).flatten()
avgdist = np.array(avgdist).flatten()
#put data into DataFrame
myDataframe = pd.DataFrame({"Date" : date, "SamplePeriod_seconds" : seconds, "avgdist" : avgdist})
# obtain a list of all possible Sampleperiods
globalunique = np.sort(myDataframe["SamplePeriod_seconds"].unique())
for i, group in myDataframe.groupby("Date"):
graphFilename = (basename+'_' + str(i) + '.png')
fig = plt.figure(graphFilename, figsize=(6,3))
axes = fig.add_subplot(111)
plt.grid(True)
# omit the `dates` column
dfgroup = group[["SamplePeriod_seconds", "avgdist"]]
# obtain a list of Sampleperiods for this date
unique = np.sort(dfgroup["SamplePeriod_seconds"].unique())
# plot the boxes to the axes, one for each sample periods in dfgroup
# set the boxes' positions to the values in unique
dfgroup.boxplot(by=["SamplePeriod_seconds"], sym='g+', positions=unique, ax=axes)
# set xticks to the unique positions, where boxes are
axes.set_xticks(unique)
# make sure all plots share the same extent.
axes.set_xlim([-0.5,globalunique[-1]+0.5])
axes.set_ylim([0,30000])
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.suptitle(str(i) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9)
plt.title("")
plt.savefig(graphFilename)
plt.close()
This will still work, if the values in the SamplePeriod_seconds
columnare non-equally spaced, but of course if they are extremely different, this will not produce nice results, because the bars will overlapp: 如果
SamplePeriod_seconds
列中的值间隔不相等,这仍然会起作用,但是,如果它们之间的差异非常大,则不会产生很好的结果,因为这些条会重叠p:
This however is not a problem with plotting itself. 但是,这对绘图本身不是问题。 And for further help, one would need to know how you expect the graph to look like at the end.
为了获得进一步的帮助,您需要知道您期望图形的外观如何。
Thank you everyone very much for the help, using your answers I got it working with the following code. 非常感谢大家的帮助,使用您的回答,我将其与以下代码结合使用。 (I realize it can probably be improved but happy that it works I can look at the data now :) )
(我意识到它可能会得到改进,但很高兴它能起作用,现在我可以查看数据了:))
valuesShouldPlot = ['0.25','0.5','1.0','2.0','5.0','10.0','20.0','30.0','60.0','120.0','300.0','600.0','1200.0','1800.0','2400.0','3000.0','3600.0','7200.0','10800.0','14400.0','18000.0','21600.0','25200.0','28800.0']
for xDate, group in myDataframe.groupby("Date"): ## for each date
graphFilename = (basename+'_' + str(xDate) + '.png') ## make up a suitable filename for the graph
plt.figure(graphFilename)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both') ## create box plot, (boxplots are placed in default positions)
## get information on where the boxplots were placed by looking at the values on the x-axis
axes = plt.gca()
checkXticks= axes.get_xticks()
numOfValuesPlotted =len(checkXticks) ## check how many boxplots were actually plotted by counting the labels printed on the x-axis
lengthValuesShouldPlot = len(valuesShouldPlot) ## (check how many boxplots should have been created if no data was missing)
if (numOfValuesPlotted < valuesShouldPlot): ## if number of values actually plotted is less than the maximum possible it means some values are missing
## if that occurs then want to move the plots across accordingly to leave gaps where the missing values should go
labels = [item.get_text() for item in axes.get_xticklabels()]
i=0 ## counter to increment through the entire list of x values that should exist if no data was missing.
j=0 ## counter to increment through the list of x labels that were originally plotted (some labels may be missing, want to check what's missing)
positionOfBoxesList =[] ## create a list which will eventually contain the positions on the x-axis where boxplots should be drawn
while ( j < numOfValuesPlotted): ## look at each value in turn in the list of x-axis labels (on the graph plotted earlier)
if (labels[j] == valuesShouldPlot[i]): ## if the value on the x axis matches the value in the list of 'valuesShouldPlot'
positionOfBoxesList.append(i) ## then record that position as a suitable position to put a boxplot
j = j+1
i = i+1
else : ## if they don't match (there must be a value missing) skip the value and look at the next one
print("\n******** missing value ************")
print("Date:"),
print(xDate),
print(", Position:"),
print(i),
print(":"),
print(valuesShouldPlot[i])
i=i+1
plt.close() ## close the original plot (the one that didn't leave gaps for missing data)
group.boxplot(by=["SamplePeriod_seconds"], sym='g+', return_type='both', positions=positionOfBoxesList) ## replot with boxes in correct positions
## format graph to make it look better
plt.ylabel('Average distance (m)', fontsize =8)
plt.xlabel('GPS sample interval (s)', fontsize=8)
plt.tick_params(axis='x', which='major', labelsize=8)
plt.tick_params(axis='y', which='major', labelsize=8)
plt.xticks(rotation=90)
plt.title(str(xDate) + ' - ' + 'Average distance travelled by cattle over 24 hour period', fontsize=9) ## put the title above the first subplot (ie. at the top of the page)
plt.suptitle('')
axes = plt.gca()
axes.set_ylim([0,30000])
## save and close
plt.savefig(graphFilename)
plt.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.