简体   繁体   中英

R Match and compare values from different vectors

I am calculating a hourly mean price from a single price vector. I would like to compare this hourly value with the mean daily value - and remove all values that lie more than 2x daily-mean away. I have no problem calculating the different values, but I don't know how to compare the hourly values with the daily values ?

Quick-Data example:

df <- data.frame(dates = rep(seq(from = as.POSIXct("2013-01-01 00:00:00", tz = "UTC"), 
  to = as.POSIXct("2013-01-30 23:00:00", tz = "UTC"), by = "hour" ), 12), 
  price = runif(8640, min = -25, max = 225) )

require(dplyr)

results <- group_by(df, dates)
results <- summarise(results, 
                          average = mean(price))

day_results <- mutate(df, days = format(df$dates, "%Y-%m-%d"))
day_results <- group_by(day_results, days)
day_results <- summarise(day_results, 
                          average_d = mean(price))

I am lost at how to compare the 24 values of average with the single daily value of average_d .

Is it clear what I am trying to do?

Is this as simple as:

> df %>% group_by(dates) %>% filter(price>2*mean(price))
Source: local data frame [811 x 2]
Groups: dates

                 dates    price
1  2013-01-01 02:00:00 182.4726
2  2013-01-01 07:00:00 155.5009
3  2013-01-01 20:00:00 139.6948
4  2013-01-01 22:00:00 132.3332
5  2013-01-02 06:00:00 222.0633
6  2013-01-03 01:00:00 217.6383
7  2013-01-03 15:00:00 224.7268
8  2013-01-03 18:00:00 215.8826

ie group your data by dates, then filter only those where the price is more than twice the mean within that group? Or if you want to keep the mean price in the output too, do:

> df %>% group_by(dates) %>% mutate(average=mean(price)) %>% filter(price > 2*average) %>% arrange(dates)
Source: local data frame [811 x 3]
Groups: dates

                 dates    price  average
1  2013-01-01 00:00:00 140.5748 70.12211
2  2013-01-01 00:00:00 201.6484 70.12211
3  2013-01-01 01:00:00 223.9240 89.91996
4  2013-01-01 01:00:00 196.5975 89.91996
5  2013-01-01 01:00:00 203.6165 89.91996
6  2013-01-01 02:00:00 182.4726 70.85858
7  2013-01-01 02:00:00 193.0930 70.85858
8  2013-01-01 02:00:00 177.7848 70.85858
9  2013-01-01 03:00:00 202.9842 92.84580
10 2013-01-01 03:00:00 217.1840 92.84580

That also uses arrange to order the output by date.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM