简体   繁体   中英

Grouping a data frame in R by month and year using yearmon()

EDIT:

I figured it out!

df_CloseDelta$YearMonth <- as.yearmon(df_CloseDelta$date)
df_CloseDelta %>%
    group_by(stock, YearMonth) %>%
    summarize(minCloseDelta = min(closeDelta),
              meanCloseDelta = mean(closeDelta),
              maxCloseDelta = max(closeDelta)) -> df_summary_CloseDelta

I created the following data frame that shows the date, the name of the stock, and the percentage difference between the closing price of each respective stock compared to the day prior.

在此处输入图片说明

library(quantmod)
library(dplyr)
library(tidyr)
library(ggplot2)
library(zoo)

start <- as.Date("2014-01-01")
end <- as.Date("2017-12-31")
getSymbols(c("AAPL", "AMZN", "FB", "GOOG", "MSFT"),
           from = start, to = end, return.class = "data.frame")

df_wide <- bind_cols(AAPL, AMZN, FB, GOOG, MSFT) %>%
    mutate(date = as.Date(rownames(AAPL))) 

df_long <- df_wide %>%
    gather(key = stock, value = value, - date) %>%
    separate(stock, into = c("stock", "type"))

df_panel <- df_long %>%
    spread(key = type, value = value)

df_CloseDelta <- df_panel %>%
    group_by(stock) %>%
    mutate(closeDelta = 100 * (Close - lag(Close))/lag(Close)) %>%
    select(date, stock, closeDelta) %>%
    filter(!is.na(closeDelta))

I am trying to group these entries by year and month (all of the percentage changes in closing price for AAPL in January 2014, all of the percentage changes in closing price for AMZN in January 2014, etc.) I'm trying to use the yearmon() function to do this, and thought I could create (mutate) a new column where I could extract the year and month and then group the data, or just group it without creating a new column. I was able to do this:

as.yearmon(df_CloseDelta$date)

And return:

[1] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[8] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[15] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[22] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"

And so on for every entry.

I then tried to group this:

df_summary_CloseDelta <- df_CloseDelta %>%
    group_by(as.yearmon(df_CloseDelta$date))

But received this error:

Error in mutate_impl(.data, dots) : 
  Column `as.yearmon(df_CloseDelta$date)` must be length 1006 (the group 
size) or one, not 5030

I understand that there are 1,006 dates, but that there are 5,030 entries given that there are five stocks. I'm trying to group them, then find out the average, minimum, and maximum per month and year for each respective stock. May someone point me in the right direction?

group_by is expecting you to give it either variable names, or a vector the same length as the number of rows in your data which will be treated as a factor by which to perform the grouping. See below as an example.

> btest <- data.frame(a = LETTERS[1:10],
+                     b = c(1,1,2,2,3,3,4,4,5,5),
+                     c = c(rep('e',5), rep('f',5)))
> btest
   a b c
1  A 1 e
2  B 1 e
3  C 2 e
4  D 2 e
5  E 3 e
6  F 3 f
7  G 4 f
8  H 4 f
9  I 5 f
10 J 5 f

Now we can calculate the sums for our group of interest in one of two ways. The traditional way is to use group_by and then our variable c .

> btest %>% 
+   group_by(c) %>% 
+   summarise(ex = mean(b))
# A tibble: 2 x 2
  c        ex
  <fct> <dbl>
1 e      1.80
2 f      4.20

However, what your code thinks you're trying to do is provide row-by-row values which it will use to form a grouping.

> btest %>% 
+   group_by(c(1,1,1,1,1,2,2,2,2,2)) %>% 
+   summarise(ex = mean(b))
# A tibble: 2 x 2
  `c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)`    ex
                              <dbl> <dbl>
1                              1.00  1.80
2                              2.00  4.20

Just so we're clear on what the means should be.

> mean(c(1,1,2,2,3))
[1] 1.8
> mean(c(3,4,4,5,5))
[1] 4.2

Your problem here is that you need to add the column that you want to group by first, then you can group by that.

> df_CloseDelta[['date_yearmon']] <- as.yearmon(df_CloseDelta[['date']])
> 
> df_CloseDelta %>% 
+   group_by(date_yearmon, stock) %>% 
+   summarise(mean_closedelta = mean(closeDelta))
# A tibble: 240 x 3
# Groups:   date_yearmon [?]
   date_yearmon  stock mean_closedelta
   <S3: yearmon> <chr>           <dbl>
 1 Jan 2014      AAPL          -0.474 
 2 Jan 2014      AMZN          -0.472 
 3 Jan 2014      FB             0.746 
 4 Jan 2014      GOOG           0.310 
 5 Jan 2014      MSFT           0.104 
 6 Feb 2014      AAPL           0.269 
 7 Feb 2014      AMZN           0.0631
 8 Feb 2014      FB             0.491 
 9 Feb 2014      GOOG           0.159 
10 Feb 2014      MSFT           0.0713
# ... with 230 more rows

Alternately, if you want to do the whole thing in dplyr , you could do the following.

df_CloseDelta %>%
  mutate(date_yearmon = as.character(as.yearmon(date))) %>%
  group_by(date_yearmon, stock) %>%
  summarise(mean_closedelta = mean(closeDelta))

xts has to.monthly which does the conversion to monthly directly so assuming that the input OHLCV data is in a set of xts objects in environment e as per Note at the end we apply a conversion function to each such object in e (converting both to monthly, to data frame and appending the symbol) and then rbinding the resulting data frames giving a single data.

sym2df <- function(x, env) cbind(Symbol = x, fortify.zoo(to.monthly(env[[x]], name = "")))
do.call("rbind", lapply(ls(e), sym2df, env = e))

Note

Get the stock data into environment e :

library(quantmod)

start <- "2014-01-01"
end <- "2017-12-31"
syms <- c("AAPL", "AMZN", "FB", "GOOG", "MSFT")
getSymbols(syms, from = start, to = end, env = e <- new.env())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM