[英]Merging aggregate data in R
繼我之前關於將每小時數據匯總到每日數據的問題之后,我想繼續(a)每月匯總和(b)將每月匯總合並到原始數據幀中。
我的原始數據框如下所示:
Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
在我之前的問題中已經回答了每日聚合,然后我可以找到從那里生成每月聚合的方法,如下所示:
Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
其中OutdoorAVE是每日最低和最高室外溫度的月平均值。 我最終想要的是這樣的:
Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"
我不知道如何做到這一點。 任何幫助是極大的贊賞。
嘗試ave
和例如POSIXlt
來提取月份:
zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
Data$Month <- strftime(
as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)
給:
> Data
Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00 30 25 Jan 31.33333
2 01/01/2000 02:00 31 26 Jan 31.33333
3 01/01/2000 03:00 33 24 Jan 31.33333
4 02/01/2000 01:00 29 25 Feb 31.66667
5 02/01/2000 02:00 27 26 Feb 31.66667
6 02/01/2000 03:00 39 24 Feb 31.66667
7 12/01/2000 02:00 27 26 Dec 31.33333
8 12/01/2000 03:00 39 24 Dec 31.33333
9 12/31/2000 23:00 28 25 Dec 31.33333
編輯:然后只需計算數據中的月份,如上所示並使用合並:
zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
> merge(Data,Data2[-1],all=T)
Month Date Outdoor Indoor OutdoorAVE
1 Dec 12/01/2000 02:00 27 26 31.33
2 Dec 12/01/2000 03:00 39 24 31.33
3 Dec 12/31/2000 23:00 28 25 31.33
4 Feb 02/01/2000 01:00 29 25 31.67
5 Feb 02/01/2000 02:00 27 26 31.67
6 Feb 02/01/2000 03:00 39 24 31.67
7 Jan 01/01/2000 01:00 30 25 31.33
8 Jan 01/01/2000 02:00 31 26 31.33
9 Jan 01/01/2000 03:00 33 24 31.33
這與您的問題相關,但您可能希望使用RSQLite
和單獨的表來代替各種聚合值,並使用簡單的SQL命令連接表。 如果您使用多種聚合,您的數據框很容易變得龐大和丑陋。
這是一個zoo / xts解決方案。 請注意, Month
在這里是數字,因為您不能在zoo / xts對象中混合類型。
require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)
zz <- merge(z, Month=.indexmon(z),
OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
# Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00 30 25 0 31.33333
# 2000-01-01 02:00:00 31 26 0 31.33333
# 2000-01-01 03:00:00 33 24 0 31.33333
# 2000-02-01 01:00:00 29 25 1 31.66667
# 2000-02-01 02:00:00 27 26 1 31.66667
# 2000-02-01 03:00:00 39 24 1 31.66667
# 2000-12-01 02:00:00 27 26 11 31.33333
# 2000-12-01 03:00:00 39 24 11 31.33333
# 2000-12-31 23:00:00 28 25 11 31.33333
更新:如何使用兩個不同的數據集獲得上述結果。
Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)
zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.