简体   繁体   English

R:根据高级标准绘制具有不同颜色点的图形

[英]R: Plotting a graph with different colors of points based on advanced criteria

What I would like to do is a plot (using ggplot), where the x axis represent years which have a different colour for the last three years in the plot than the rest. 我要做的是绘制一个图(使用ggplot),其中x轴表示年份,该年份在图中最近三年的颜色与其余年份不同。 The last three years should also meet a certain criteria and based on this the last three years can either be red or green. 最近三年还应该满足一定的标准,因此,过去三年可以是红色或绿色。 The criteria is that the mean of the last three years should be less (making it green) or more (making it red) than the 66%-percentile of the remaining years. 标准是,最近三年的平均值应小于剩余年份的66%百分数(使绿色)或更大(使红色)。 So far I have made two different functions calculating the last three year mean: 到目前为止,我已经做了两个不同的函数来计算最近三年的平均值:

LYM3 <- function (x) {
  LYM3 <- tail(x,3)
  mean(LYM3$Data,na.rm=T)
}

And the 66%-percentile for the remaining: 其余的百分比为66%:

perc66 <- function(x) {
  percentile <- head(x,-3)
  quantile(percentile$Data, .66, names=F,na.rm=T) 
}

Here are two sets of data that can be used in the calculations (plots), the first which is an example from my real data where LYM3(df1) < perc66(df1) and the second is just made up data where LYM3 > perc66. 这是两组可用于计算(曲线图)的数据,第一组是我的真实数据的示例,其中LYM3(df1)<perc66(df1),而第二组是由LYM3> perc66组成的数据。

df1<- data.frame(Year=c(1979:2010),
                Data=c(347261.87,  145071.29,   110181.93,  183016.71,  210995.67,  205207.33,  103291.78,  247182.10,  152894.45,  170771.50,  206534.55,  287770.86,  223832.43,  297542.86,  267343.54,  475485.47,  224575.08,  147607.81,  171732.38,  126818.10,  165801.08,  136921.58,  136947.63,  83428.05,   144295.87,  68566.23,   59943.05,   49909.08,   52149.11,   117627.75,  132127.79,  130463.80))
df2 <- data.frame(Year=c(1979:2010),
                  Data=c(sample(50,29,replace=T),75,75,75))

Here's my code for my plot so far: 这是到目前为止我的情节代码:

plot <- ggplot(df1, aes(x=Year, y=Data)) +
  theme_bw() +
  geom_point(size=3, aes(colour=ifelse(df1$Year<2008, "black",ifelse(LYM3(df1) < perc66(df1),"green","red")))) +
  geom_line() +
  scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011))
plot

As you notice it doesn't really do what I want it to do. 如您所见,它并没有真正做到我想要的。 The only thing it does seem to do is that it turns the years before 2008 into one level and those after into another one and base the point colour off these two levels. 它似乎唯一要做的就是将2008年之前的年份变成一个等级,而将之后的年份变成另一个等级,并根据这两个等级设置基点颜色。

Since I don't want this year to be stationary either, I made another tiny function: 由于我也不想今年停滞不前,因此我做了一个小小的功能:

fun3 <- function(x) {
df <- subset(x, Year==(max(Year)-2))
df$Year
}

So the previous code would have the same effect as: 因此,先前的代码将具有与以下相同的效果:

geom_point(size=3, aes(colour=ifelse(df1$Year<fun3(df1), "black","red"))) 

But it still does not care about my colours. 但是它仍然不在乎我的颜色。 Why does it make the years into levels? 为什么将岁月变成水平? And how come an ifelse function doesn't work within another one in this case? 在这种情况下,ifelse函数为什么不能在另一个函数中起作用? How would it be possible to the arguments to do what I like? 争论如何做我喜欢的事情? I realise this might be a bit messy, asking for a lot at the same time, but I hope my description is pretty clear. 我意识到这可能有点混乱,同时要提出很多要求,但我希望我的描述很清楚。 It would be helpful if someone could at least point me in the right direction. 如果有人至少可以指出我正确的方向,那将很有帮助。

I tried to put the code for the plot into a function as well so I wouldn't have to change the data frame at all functions within the plot, but I can't get it to work. 我也尝试将绘图的代码也放入函数中,这样就不必在绘图中的所有功能上都更改数据框,但是我无法使其工作。

Thank you! 谢谢!

Here is my suggestion. 这是我的建议。 I am not sure if you want to have ifelse() in color. 我不确定是否要使用ifelse()颜色。 That makes codes hard to read for me. 这使代码很难为我阅读。 I subsetted data in order to calculate mean for 2008-2010 and quantile 0.66 for the rest of the years. 我对数据进行了分集,以便计算2008-2010年的平均值,并计算其余年份的0.66分位数。 Then, I created two choices for colors. 然后,我为颜色创建了两个选择。 One includes black (29 times) and green (3 times). 一种包括黑色(29次)和绿色(3次)。 The other choice was black (29 times) and red (3 times). 另一个选择是黑色(29次)和红色(3次)。 Next step was to draw a ggplot figure using a conditional statement. 下一步是使用条件语句绘制ggplot图。 if(mean(foo$Data) < quantile(foo2$Data, 0.66)) is true, R picks up b for colors, which includes green. if(mean(foo$Data) < quantile(foo2$Data, 0.66))为true,则R选择b作为颜色,其中包括绿色。 Otherwise, R picks up c for colors. 否则,R选择c以获得颜色。 In this way, you do not have to do a lot for colors in ggplot() . 这样,您不必为ggplot()颜色做很多ggplot() I hope this will help you. 我希望这能帮到您。

UPDATES ADDED 更新已添加

I changed the filter part. 我更换了过滤器部分。 As for the quantile line, this post is very useful. 至于分位数线, 这个帖子非常有用。 Basically, you need a dummy data frame for the value of quantile 0.66. 基本上,您需要一个虚拟数据帧来获取分位数0.66的值。 geom_hline is added as well. geom_hline被添加。

library(ggplot2)

# Filter data (If you are sure that last three rows are the ones you need to
# extract, this is the way.
foo <- tail(df1, n = 3)  
foo2 <- setdiff(df1, foo)

# Set up colours
a <- c(nrow(foo2), nrow(foo))
b <- rep(c("black", "green"), a)
c <- rep(c("black", "red"), a)

# Create a dummy data frame for the quantile line
# Column names can be anything (here, A and Z)

agasi <- data.frame(X = c("A"), Z = quantile(foo2$Data, 0.66))

if(mean(foo$Data) < quantile(foo2$Data, 0.66)){

ggplot(df1, aes(x=Year, y=Data)) +
    theme_bw() +
    geom_point(size=3, color = b) +
    geom_line() +
    scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) +
    geom_hline(data = agasi, aes(yintercept = Z))

} else{

ggplot(df1, aes(x=Year, y=Data)) +
    theme_bw() +
    geom_point(size=3, color = c) +
    geom_line() +
    scale_x_continuous(breaks=c(1980,1985,1990,1995,2000,2005,2010), limits=c(1978,2011)) +
    geom_hline(data = agasi, aes(yintercept = Z))   

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM