简体   繁体   English

ggplot中x轴标签之间的自定义间距

[英]Custom spacing between x axis labels in ggplot

I have a df: 我有一个df:

   Year          Ratio       N    Mean        sd        se        ci
97  1867 TILLBANK...PLACTILL 2  3.861999  4.082170  2.886530  36.67685
98  1867   TILLOBL..PLACTILL 2 21.848833 17.859532 12.628596 160.46153
99  1867   TILLLOAN.PLACTILL 2 54.197044 23.309360 16.482207 209.42629
100 1867   TILLEQUI.PLACTILL 2  0.000000  0.000000  0.000000   0.00000
101 1867   TILLCONT.PLACTILL 2  0.000000  0.000000  0.000000   0.00000
102 1867   TILLRECI.PLACTILL 2 10.772286  5.110514  3.613679  45.91615


str(df) :

     'data.frame':  1152 obs. of  7 variables:
 $ Year : Factor w/ 156 levels "1855","1856",..: 13 13 13 13 13 13 13 13 14 14 ...
 $ Ratio: Factor w/ 8 levels "TILLBANK...PLACTILL",..: 1 2 3 4 5 6 7 8 1 2 ...
 $ N    : num  2 2 2 2 2 2 2 2 2 2 ...
 $ Mean : num  3.86 21.85 54.2 0 0 ...
 $ sd   : num  4.08 17.86 23.31 0 0 ...
 $ se   : num  2.89 12.63 16.48 0 0 ...
 $ ci   : num  36.7 160.5 209.4 0 0 ...

1) I am doing a ggplot : 1)我正在做一个ggplot

qqs<-ggplot(dfccomp, aes(x=Year, y=sd,colour=Ratio))+geom_point()+
    facet_grid(Ratio~.)+
    theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black"))

This plot works with geom_point() but now with geom_line() . 此图适用于geom_point()但现在使用geom_line() If I use geom_point() I then get very messy x-axis with all the years (from 1867 to 2010): 如果我使用geom_point()那么我会在所有年份(从1867年到2010年)获得非常混乱的x轴: 在此输入图像描述

And if I use geom_line() ,which does not work, I get: 如果我使用geom_line() ,这不起作用,我得到: 在此输入图像描述

So, I wonder how it is possible to only pick some certain years to be present on the x axis? 所以,我想知道如何才能在x轴上选择某些年份?

2) The other strange thing that I dont understand is if I convert the df$Year above to numeric, 2)我不理解的另一个奇怪的事情是,如果我将上面的df$Year转换为数字,

df$Year=as.numeric(as.character(df$Year))

Plot is then: 情节是: 在此输入图像描述

Now, only 3 years are present on the x-axis. 现在,x轴上只有3年。 Which is better but still not what I want... 哪个更好,但仍然不是我想要的......

why does both geom_point() and geom_line() works? 为什么geom_point()geom_line()有效?

Updated: On the answer below I read that "Year is a factor and as such ggplot() will interpret that accordingly and produce a dotplot. The reason geom_line() doesn't do anything as this geom doesn't make sense for the data supplied; the factor nature indicates to ggplot() that the x-axis is not continuous and there is nothing to draw between points on that axis, hence no lines.". 更新:在下面的答案中,我读到“年份是一个因素,因此ggplot()会相应地解释它并生成一个点图。之所以geom_line()没有做任何事情,因为这个geom对数据没有意义提供;因素自然表明ggplot()x轴不连续,并且在该轴上的点之间没有任何东西可以绘制,因此没有线。“

But I have a different plot where geom_line() works with a factor. 但是我有一个不同的情节,其中geom_line()与一个因子一起工作。 Why is it so? 为什么会这样?

qq<-ggplot(df, aes(x=Year, y=Mean,colour=Ratio)) + 
    geom_errorbar(aes(ymin=Mean-sd, ymax=Mean+sd), colour="black", width=.1, position=position_dodge(.1)) +
    geom_line(position=position_dodge(.1)) +
    geom_point(position=position_dodge(.1), size=3, shape=21, fill="white") + # 21 is filled circle
    xlab("Year") +
    ylab("Mean (%)")+ggtitle("Ratios")+facet_grid(Ratio~.)+theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black"))

The picture: 图片: 在此输入图像描述

Year is a factor and as such ggplot() will interpret that accordingly and produce a dotplot. Year是一个因素,因此ggplot()将相应地解释它并产生一个点图。 The reason geom_line() doesn't do anything as this geom doesn't make sense for the data supplied; geom_line()没有做任何事情的原因,因为这个geom对提供的数据没有意义; the factor nature indicates to ggplot() that the x-axis is not continuous and there is nothing to draw between points on that axis, hence no lines. ggplot()性质指示ggplot() x轴不连续,并且在该轴上的点之间没有任何东西可绘制,因此没有线。

That this is the case is clearly shown by the figure you get with geom_line() after converting Year to a numeric variable. 在将Year转换为数字变量之后,使用geom_line()获得的数字清楚地显示了这种情况。 Now ggplot() , following its grammar, produces a line chart for the continuous x-axis data. 现在ggplot()遵循其语法,为连续的x轴数据生成折线图。

So now your question boils down to controlling the scale on the x-axis (scale is what ggplot() calls the axis). 所以现在你的问题归结为控制x轴上的比例(比例是ggplot()所谓的轴)。 I see two options; 我看到两个选择;

  1. Provide your own scale using scale_x_continous() as documented here 使用scale_x_continous()提供您自己的比例, scale_x_continous() 处所述
  2. Convert your Year numeric data to a Date object and let ggplot() handle the scale or customise it via scale_x_date() , as documented and illustrated here 将您的Year数字数据的Date对象,并让ggplot()处理规模或通过定制scale_x_date()作为记录和说明在这里

To convert to a Date object you could do something like this: 要转换为Date对象,您可以执行以下操作:

dfccomp <- transform(dfccomp,
                     Year = as.Date(paste(Year, "01", "01", sep = "-")))

alter the two "01" s to be whatever month (the first "01" ) or day of month you want, but whatever you choose it is in effect arbitrary and not required; 将两个"01"改为你想要的任何月份(第一个"01" )或月份,但无论你选择什么,它实际上是任意的而不是必需的; that data points will be 1 year apart. 该数据点将相隔1年。

You can then use the minor_breaks argument in scale_x_date() to control how many or where minor ticks are shown, plus the breaks argument to set which years are shown. 然后,您可以使用scale_x_date()minor_breaks参数来控制显示次要刻度的数量或位置,再加上breaks参数来设置显示的年份。 I suggest you don't show all years otherwise the resulting plot will be a mess. 我建议你不要显示所有年份,否则产生的情节将是一团糟。 You also don;t need each year as a minor break as agin the grid lines will just swamp the plot. 你也不需要每年作为一个小小的突破,因为网格线只会淹没情节。

If you use Year as factor, ggplot will print a label for every factor level. 如果使用Year作为因子, ggplot将为每个因子级别打印一个标签。 You can see this in your first two plots. 您可以在前两个图中看到这一点。

If you use Year as numeric variable, ggplot will automatically select a subset of the values for the labels of the x-axis. 如果使用Year作为数字变量, ggplot将自动为x轴的标签选择值的子集。 In your third plot, the distance between two breaks is 100. 在第三个图中,两次休息之间的距离为100。

You can manually specify where to do you want the break points on the x-axis with scale_x_continuous and the argument breaks . 您可以使用scale_x_continuous和参数breaks手动指定x轴上breaks In the example below, a the distance between the breaks is 20. Play around with the code to find the desired plot. 在下面的示例中,中断之间的距离为20.使用代码来查找所需的绘图。

ggplot(df, aes(x=as.numeric(as.character(Year)), y=sd, colour=Ratio)) +
geom_point() +
facet_grid(Ratio~.) +
theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black")) +
scale_x_continuous(breaks = as.numeric(levels(df$Year))[c(TRUE, rep(FALSE, 19))])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM