简体   繁体   English

遍历数据子集以在R中生成其他图

[英]Looping through data subsets to generate additional plots in R

(apologies for the non-standard R, I'm new to it) (对非标准R表示歉意,我是新来的)

I have an array of colors that my data is formatted with: 我有一个颜色数组,数据的格式设置为:

colors <- c("red", "blue", "orange", "turquoise4", "green3")

I'm generating a plot that is organized by colour/Party: 我正在生成按颜色/派对组织的剧情:

main_aes = aes(x = Date, y = Popular_Support, colour=Party, size=1/Error, weight=1/Error)
plot <- ggplot(polls) 
plot2 <- plot + geom_point(main_aes)
plot2 <- plot2 + scale_colour_manual(values = colors)

to which I add a trendline: 我向其添加趋势线:

plot_smooth <- plot2 + stat_smooth(data=polls, span = .35) 

I 一世 want a different trend and ribbon for each color 想要每种颜色都有不同的趋势和功能区 , need to manipulate the smoothed data before plotting it, so I extract the smoothed data: ,需要在绘制平滑数据之前对其进行处理,因此我提取了平滑数据:

smooth_data <- ggplot_build(plot_smooth)$data[[2]]
# do some custom manipulations of smooth_data here

Then I manually create the individual trends and confidence ribbons. 然后,我手动创建各个趋势和置信区。 It is this part I'm looking for help using loops instead of repetitive code: 这是我在寻找使用循环而不是重复代码的部分:

party_trend.1 <- subset(smooth_data, colour == colors[1])
party_trend.2 <- subset(smooth_data, colour == colors[2])
party_trend.3 <- subset(smooth_data, colour == colors[3])
party_trend.4 <- subset(smooth_data, colour == colors[4])
party_trend.5 <- subset(smooth_data, colour == colors[5])

plot <- plot + geom_ribbon(data = party_trend.1, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.2, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.3, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.4, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.5, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)

plot <- plot + geom_line(data = party_trend.1, colour=colors[1], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.2, colour=colors[2], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.3, colour=colors[3], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.4, colour=colors[4], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.5, colour=colors[5], aes(x = x, y = y))

I assume if I can create an array for the party_trend, the other 2 loops will be easy, I tried something like this: 我假设如果可以为party_trend创建一个数组,那么其他2个循环将很容易,我尝试了如下操作:

party_trend <- 0
for(i in colors) {
    party_trend[i] <- subset(smooth_data, colour == colors[i])
}

But I can't figure out how to create/initialize the party_trend array before using it here. 但是我无法在这里使用它之前弄清楚如何创建/初始化party_trend数组。 This gives me the error: 这给了我错误:

Warning messages: 警告信息:

1: In party_trend[i] <- subset(smooth_data, colour == colors[i]) :   number of items to replace is not a multiple of replacement length 
2: In party_trend[i] <- subset(smooth_data, colour == colors[i]) :   number of items to replace is not a multiple of replacement length
3...5

Here's a working fiddle 这是一个工作的小提琴

EDIT FOR CONTEXT This is not relevant to the looping question, but might help explain why I'm not just using the default plot. 编辑上下文这与循环问题无关,但可能有助于解释为什么我不仅仅使用默认图。 The reason I am extracting the data and re plotting manually is because sometimes I want the trend and ribbon to use data that will not be plotted. 我提取数据并手动重新绘制的原因是因为有时我希望趋势图和功能区使用不会绘制的数据。 So I used the default calculation from the entire data set, extract it, trim it, and only plot the part I want. 因此,我使用了整个数据集中的默认计算,将其提取,修剪并仅绘制所需的零件。 This is not happening in this specific fiddle (though you can see remnants on line 107-8), but you can see the results here . 在这个特定的小提琴中并没有发生这种情况(尽管您可以在第107-8行看到残留物),但是您可以在此处看到结果。 Notice how the beginning of the plot does have the characteristic "trumpet" shape, because the trend is using more data than is actually being displayed, that continues to the left. 请注意,曲线的开始确实具有特征性的“喇叭形”形状,因为趋势使用的数据多于实际显示的数据,该数据继续向左移动。

If you need to create a line and a ribbon for each level of Party you don't need to create them individually with. 如果您需要为每个级别的Party创建一行和一个功能区,则无需单独创建它们。 You can put the factors in ggplot(data=polls, aes(x=Date, y=Popular_Support, color=Party)) and then call geom_smooth() : 您可以将因子放在ggplot(data=polls, aes(x=Date, y=Popular_Support, color=Party)) ,然后调用geom_smooth()

plot <- ggplot(data=polls,
               aes(x=Date, y=Popular_Support, color=Party)) + 
    geom_point() +
    geom_smooth()

So you can remove that: 因此,您可以删除它:

#Do this in a loop!!!!
party_trend <- 0
for(i in colors) {
  party_trend[i] <- subset(smooth_data, colour == colors[i])
}

party_trend.1 <- subset(smooth_data, colour == colors[1])
party_trend.2 <- subset(smooth_data, colour == colors[2])
party_trend.3 <- subset(smooth_data, colour == colors[3])
party_trend.4 <- subset(smooth_data, colour == colors[4])
party_trend.5 <- subset(smooth_data, colour == colors[5])

plot <- plot + geom_ribbon(data = party_trend.1, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.2, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.3, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.4, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_ribbon(data = party_trend.5, aes(x=x, ymin=ymin, ymax = ymax), alpha = .25)
plot <- plot + geom_line(data = party_trend.1, colour=colors[1], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.2, colour=colors[2], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.3, colour=colors[3], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.4, colour=colors[4], aes(x = x, y = y))
plot <- plot + geom_line(data = party_trend.5, colour=colors[5], aes(x = x, y = y))

#print(plot)
plot <- plot + scale_colour_manual(values = colors)
plot <- plot +  geom_point(main_aes, alpha=0.8)

Finally you can add your other plot features: 最后,您可以添加其他绘图功能:

plot <- plot + scale_colour_manual(values = colors)
plot <- plot +  geom_point(main_aes, alpha=0.8)

plot <- plot +   scale_size_area(max_size=3, breaks=seq(20,60,10), labels=seq(20,60,10)^2, name="Sample Size") 
plot <- plot +   guides(color = guide_legend(order=-1) )
  #last election                                 
plot <- plot + geom_point(data=LastElection, size=3, shape=5, show_guide= F, main_aes) 
plot <- plot + geom_point(data=LastElection, size=2, show_guide=F, main_aes) 
plot <- plot + geom_text(data=LastElection, show_guide=F, 
            aes(x = Date, y = Popular_Support, label = Popular_Support), size=3, hjust=-.2, vjust=-0.4)
# this election
#plot <- plot + geom_point(data=ThisElection, size=3, shape=5, show_guide=F, main_aes) +
#  geom_point(data=ThisElection, size=2, show_guide=F, main_aes) +
#  geom_text(data=ThisElection, show_guide=F, 
#            aes(x = Date, y = Popular_Support, label = Popular_Support), size=3, hjust=-.2, vjust=-0.4)

plot <- plot + scale_x_continuous(name = "Date", limits=c(42291,43759), minor_breaks = seq(42291, 43759, by=30),breaks = seq(42291, 43759, by=90))
                                  #minor_breaks = seq(42216, 42296, by=1), breaks = seq(42291, 43759, by=90))
plot <- plot + scale_y_continuous(name = "% Popular Support", lim=c(0,56), expand=c(0,0)) 
plot <- plot + theme(axis.text.x = element_text(size = 11, vjust=.5, angle = 90, colour="#333333"))
plot <- plot + theme(axis.title.x = element_blank())
plot <- plot + theme(axis.text.y = element_text(size = 11))
plot <- plot + theme(axis.title.y = element_text(size = 11, angle = 90, colour="#333333"))
#theme(legend.justification=c(1,1), legend.position=c(1,1))

UPDATE : 更新

You can use another dataframe in geom_smooth() . 您可以在geom_smooth()使用另一个数据geom_smooth() Here is a minimal example showing a case where the line and ribbon are calculated and plotted from another dataframe than the data points: 这是一个最小的示例,显示了从其他数据帧(而不是数据点)计算并绘制线条和功能区的情况:

n <- rnorm(100, mean=0, sd=1)
dat <- data.frame(x=1:100, y=n, lab=c(rep('a', 50), rep('b', 50)))

n1 <- rnorm(100, mean=2, sd=1)
dat1 <- data.frame(x=1:100, y=n1, lab=c(rep('a', 50), rep('b', 50)))

ggplot(data=dat, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth(data=dat1)

giving: 赠送:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM