![](/img/trans.png)
[英]Create and fill columns in a dataset with data in rows from a different dataset
[英]Sum or aggregate minute dataset to daily dataset, applying different function to different columns for every 60 rows in R
我有這個數據集
x<-data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01-2010"," 00:01"," 00:02"," 00:03"," 00:04", "12.2", "12.1", "13.1", "11.4", "12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6))
names(x)<-c("date","time","pressure","temperature","rain","windspeed")
date time pressure temperature rain windspeed
1 01-01-2010 00:01 12.2 12 12 16.9
2 01-01-2010 00:02 12.1 13 4 17.5
3 01-01-2010 00:03 13.1 5 7 18.8
4 01-01-2010 00:04 11.4 8 9 21.0
這是我的數據集的簡化版本。 我的數據集從2010年1月1日00:01開始到2017年12月31日23:59。
我正在尋找
1)將平均壓力,溫度和風速變成每小時數據。
2)將雨量匯總成小時數據。
制作一個新的每小時時間戳以粘貼所有這些新數據很簡單,我只需要知道什么是平均和求和不同列的最佳方法,並且最多只能重復60行(60分鍾才能創建1小時),直到12-31 -2017 23:59
謝謝你的建議。
# sample data
x1 <- data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01-
2010","00:00:01","00:00:02","00:00:03","00:00:04", "12.2", "12.1", "13.1", "11.4",
"12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6))
x2 <- data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01-
2010","01:00:01","01:00:02","01:00:03","01:00:04", "12.2", "12.1", "13.1", "11.4",
"12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6))
x <- rbind(x1, x2)
names(x) <- c("date","time","pressure","temperature","rain","windspeed")
x[,3:6] <- apply(x[,3:6], 2, as.numeric)
# two separate aggregates
aggregate(x[,c('pressure', 'temperature', 'windspeed')], by = list(paste0(x$date,
substring(x$time, 1, 2))), FUN = 'mean')
aggregate(x[,c('rain'), drop = FALSE],
by = list(paste0(x$date, substring(x$time, 1, 2))), FUN = 'sum')
# Group.1 pressure temperature windspeed
#1 01-01-201000 12.2 9.5 18.55
#2 01-01-201001 12.2 9.5 18.55
# Group.1 rain
#1 01-01-201000 32
#2 01-01-201001 32
我建議使用整潔的程序包和tibbletime來清晰,輕松地完成任務。 我添加了一些清理代碼,以便以所需的格式獲取示例數據。
這種方法是高度可重復和可解釋的。 TibbleTime允許您在使用通用功能的同時對基於時間的數據進行大量匯總和滾動計算。
# The provided example data -----------------------------------------------
x<-data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01-2010"," 00:01"," 00:02"," 00:03"," 00:04", "12.2", "12.1", "13.1", "11.4", "12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6),
stringsAsFactors = FALSE)
names(x)<-c("date","time","pressure","temperature","rain","windspeed")
# Load Libraries ----------------------------------------------------------
library(dplyr)
library(lubridate)
library(tibbletime)
# Fix column classes of data ----------------------------------------------
x <- x %>%
mutate_at(vars(pressure:windspeed),as.numeric)
# Convert to tibbletime object --------------------------------------------
x <- x %>%
mutate(date_time = mdy_hm(paste0(date,time))) %>%
as_tbl_time(index = date_time) %>%
select(date_time,everything())
# Use tibbletime function to roll up hourly -------------------------------
x_hourly <- x %>%
collapse_by('hourly',side = 'start') %>%
group_by(date_time) %>%
summarise(pressure = mean(pressure, na.rm = TRUE),
temperature = mean(temperature, na.rm = TRUE),
rain = sum(rain, na.rm = TRUE),
windspeed = mean(windspeed, na.rm = TRUE))
結果:
> x_hourly
# A time tibble: 1 x 5
# Index: date_time
date_time pressure temperature rain windspeed
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2010-01-01 00:01:00 12.2 9.5 32 18.6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.