简体   繁体   English

在我的数据集中添加一个标准误差列,以便可以绘制误差线

[英]Adding a standard error column to my data set so error bars can be plotted

Data <- data.frame(id, consumption, Day, Hour)
#The data is a large time series data set with thousands of valued per household id.
#eg. 
consumption <- c(99, 119, 130, 110, 109, 118) etc.
#Hour and Day were calculated from the Date Time of the dataset.

I have created a two separate line graphs using ggplot2 for total mean energy consumption and mean energy consumption between 4 and 8pm for a range of households.我使用 ggplot2 创建了两个单独的折线图,用于表示一系列家庭的总平均能耗和下午 4 点到 8 点之间的平均能耗。 I am looking to add value specific (not constant) error bars which correspond to the standard error for each value.我希望添加与每个值的标准误差相对应的特定值(非常数)误差线。 I am unsure of how to add a standard error column to my data set corresponding to each individual value.我不确定如何将标准错误列添加到与每个单独值对应的数据集中。 If you could use pipes that would be great!如果您可以使用管道,那就太好了!

I have looked online for different methods to calculate individual standard errors and add a column with them however nothing has worked.我在网上寻找不同的方法来计算单个标准误差并用它们添加一列,但没有任何效果。 It may be because I am not plotting the raw data, but instead plotting data that has been summarised (sum and mean).这可能是因为我没有绘制原始数据,而是绘制了已汇总的数据(总和和平均值)。 The two plots 1) and 2) will have different error bars for the same dates.两个图 1) 和 2) 对于相同的日期会有不同的误差线。 I have included a picture to what the plot should look like at the end.我在最后附上了一张关于情节应该是什么样子的图片。

These are my plots: 1) Overall Daily Mean Consumption这些是我的情节:1)总体每日平均消费

Data %>%
 group_by(id, Day)%>% 
#id is household identification
  summarise(DailyCons = sum(consumption))%>% 
#Sum for total daily consumption per household
group_by(Day)%>%
 summarise(MeanDailyCons = mean(DailyCons))%>% 
#Find mean daily consumption for all households
  ggplot()+
  geom_line(aes(x= Day, y= MeanDailyCons))

2) Daily Mean between 16:00-20:00 2) 16:00-20:00之间的日均值

Data %>%
  mutate(TimeInt = ifelse(Hour %in% c(16, 17, 18, 19, 20), Hour, NA))%>% 
#removing Hours outside of range 16-20
  group_by(id, TimeInt, Day) %>%
  na.omit(TimeInt)%>%
  summarise(sumPeakCons = sum(consumption)) %>% 
#sum for total consumption for each hour in interval for each house
  group_by(bmg_id, Day) %>%
  summarise(PeakCons = sum(sumPeakCons)) %>% 
#sum for total daily consumption in interval for each house
  group_by(Day) %>%
  summarise(DailyPeakCons = mean(PeakCons)) %>% 
# Daily mean consumption for all houses
  ggplot()+
  geom_line(aes(x= Day, y= DailyPeakCons))

An image is included to show the desired result.包含一个图像以显示所需的结果。

https://i.stack.imgur.com/WDT8Z.png https://i.stack.imgur.com/WDT8Z.png

You are correct that you cannot add the standard error after you summarize the data by day.您是正确的,您不能在按天汇总数据后添加标准误差。 Any function that would try would just receive a mean and a datetime, not enough to create an error.任何尝试的函数只会收到一个平均值和一个日期时间,不足以产生错误。 The standard error must be added when you summarize from the raw data.当您从原始数据中汇总时,必须添加标准误差。

Add another column to your summarise statement:将另一列添加到您的汇总语句中:

summarise(DailyPeakCons = mean(PeakCons),DailyPeakConsErr = sd(PeakCons)) %>%

This will give the standard deviation of each day's peak consumptions.这将给出每天峰值消耗的标准偏差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM