簡體   English   中英

將分鍾數據集匯總或匯總為每日數據集,對R中的每60行將不同的函數應用於不同的列

[英]Sum or aggregate minute dataset to daily dataset, applying different function to different columns for every 60 rows in R

我有這個數據集

x<-data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01-2010"," 00:01"," 00:02"," 00:03"," 00:04", "12.2", "12.1", "13.1", "11.4", "12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6))
names(x)<-c("date","time","pressure","temperature","rain","windspeed")

        date     time pressure  temperature rain windspeed
1 01-01-2010   00:01     12.2          12   12      16.9
2 01-01-2010   00:02     12.1          13    4      17.5
3 01-01-2010   00:03     13.1           5    7      18.8
4 01-01-2010   00:04     11.4           8    9      21.0

這是我的數據集的簡化版本。 我的數據集從2010年1月1日00:01開始到2017年12月31日23:59。

我正在尋找

1)將平均壓力,溫度和風速變成每小時數據。

2)將雨量匯總成小時數據。

制作一個新的每小時時間戳以粘貼所有這些新數據很簡單,我只需要知道什么是平均和求和不同列的最佳方法,並且最多只能重復60行(60分鍾才能創建1小時),直到12-31 -2017 23:59

謝謝你的建議。

# sample data
x1 <- data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01- 
  2010","00:00:01","00:00:02","00:00:03","00:00:04", "12.2", "12.1", "13.1", "11.4", 
  "12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6))
x2 <- data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01- 
  2010","01:00:01","01:00:02","01:00:03","01:00:04", "12.2", "12.1", "13.1", "11.4", 
  "12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6))
x <- rbind(x1, x2)
names(x) <- c("date","time","pressure","temperature","rain","windspeed")
x[,3:6] <- apply(x[,3:6], 2, as.numeric)

# two separate aggregates 
aggregate(x[,c('pressure', 'temperature', 'windspeed')], by = list(paste0(x$date, 
  substring(x$time, 1, 2))), FUN = 'mean')
aggregate(x[,c('rain'), drop = FALSE], 
  by = list(paste0(x$date, substring(x$time, 1, 2))), FUN = 'sum')

#       Group.1 pressure temperature windspeed
#1 01-01-201000     12.2         9.5     18.55
#2 01-01-201001     12.2         9.5     18.55

#       Group.1 rain
#1 01-01-201000   32
#2 01-01-201001   32

我建議使用整潔的程序包和tibbletime來清晰,輕松地完成任務。 我添加了一些清理代碼,以便以所需的格式獲取示例數據。

這種方法是高度可重復和可解釋的。 TibbleTime允許您在使用通用功能的同時對基於時間的數據進行大量匯總和滾動計算。

# The provided example data -----------------------------------------------
x<-data.frame(matrix(c("01-01-2010", "01-01-2010", "01-01-2010","01-01-2010"," 00:01"," 00:02"," 00:03"," 00:04", "12.2", "12.1", "13.1", "11.4", "12", "13", "5", "8","12", "4","7","9", "16.9", "17.5","18.8", "21.0"), ncol=6),
              stringsAsFactors = FALSE)
names(x)<-c("date","time","pressure","temperature","rain","windspeed")

# Load Libraries ----------------------------------------------------------
library(dplyr)
library(lubridate)
library(tibbletime)

# Fix column classes of data ----------------------------------------------
x <- x %>% 
  mutate_at(vars(pressure:windspeed),as.numeric)

# Convert to tibbletime object --------------------------------------------
x <- x %>%
  mutate(date_time = mdy_hm(paste0(date,time))) %>%
  as_tbl_time(index = date_time) %>%
  select(date_time,everything())

# Use tibbletime function to roll up hourly -------------------------------
x_hourly <- x %>%
  collapse_by('hourly',side = 'start') %>%
  group_by(date_time) %>%
  summarise(pressure = mean(pressure, na.rm = TRUE),
            temperature = mean(temperature, na.rm = TRUE),
            rain = sum(rain, na.rm = TRUE),
            windspeed = mean(windspeed, na.rm = TRUE))

結果:

> x_hourly
# A time tibble: 1 x 5
# Index: date_time
  date_time           pressure temperature  rain windspeed
  <dttm>                 <dbl>       <dbl> <dbl>     <dbl>
1 2010-01-01 00:01:00     12.2         9.5    32      18.6

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM