简体   繁体   中英

Difference by month (dplyr, R)

Let's say I have a data set looking below. The text repeats 'a' to 'e' having value for each month. It lasts from 2016 Jan to 2016 March.

date    text    value
1-16    a       13
1-16    b       2
1-16    c       3
1-16    d       1
1-16    e       20
2-16    a       30
2-16    b       50
2-16    c       20
2-16    d       10
2-16    e       40
3-16    a       34
3-16    b       3
3-16    c       2
3-16    d       1
3-16    e       4

I want to make a column having value of difference by month and by the text. What I mean is, 17 = (30(a, 2-16) - 13(a, 1-16)) like this,

date    text    value       the value that I want to have
1-16    a       13          na  
1-16    b       2           na  
1-16    c       3           na  
1-16    d       1           na  
1-16    e       20          na  
2-16    a       30  =(value(a, 2016-feb) - value(a, 2016-jan))  17
2-16    b       50  =(value(b, 2016-feb) - value(b, 2016-jan))  48
2-16    c       20  =(value(c, 2016-feb) - value(c, 2016-jan))  17
2-16    d       10  =(value(d, 2016-feb) - value(d, 2016-jan))  9
2-16    e       40  =(value(e, 2016-feb) - value(e, 2016-jan))  20
3-16    a       34  =(value(a, 2016-mar) - value(a, 2016-feb))  4
3-16    b       3   =(value(b, 2016-mar) - value(b, 2016-feb))  -47
3-16    c       2   =(value(c, 2016-mar) - value(c, 2016-feb))  -18
3-16    d       1   =(value(d, 2016-mar) - value(d, 2016-feb))  -9
3-16    e       4   =(value(e, 2016-mar) - value(e, 2016-feb))  -36

I made the data above. The actual data is longer and even more complex as it contains approximately 2000 text for each month and the data is not even in date. (It's also divided by many categories). So some solutions that calculate the value manually would not work.

I tried with dplyr lag, for example,

df %>% group_by(date, text) %>%
arrange(date, text) %>%
mutate(diff = value - lag(value)) 

but it did not work. I assume the code did not recognise text well?

What would be the best way to do it?

With dplyr ...

library(dplyr)

df %>% group_by(text) %>% mutate(newval=c(NA,diff(value)))

   date  text  value newval
   <chr> <chr> <int>  <int>
 1 1-16  a        13     NA
 2 1-16  b         2     NA
 3 1-16  c         3     NA
 4 1-16  d         1     NA
 5 1-16  e        20     NA
 6 2-16  a        30     17
 7 2-16  b        50     48
 8 2-16  c        20     17
 9 2-16  d        10      9
10 2-16  e        40     20
11 3-16  a        34      4
12 3-16  b         3    -47
13 3-16  c         2    -18
14 3-16  d         1     -9
15 3-16  e         4    -36

Or completely in R base using ave

> df$newVal <- ave(df$value, df$text,  FUN=function(x) c(NA,diff(x)))
> df
   date text value newVal
1  1-16    a    13     NA
2  1-16    b     2     NA
3  1-16    c     3     NA
4  1-16    d     1     NA
5  1-16    e    20     NA
6  2-16    a    30     17
7  2-16    b    50     48
8  2-16    c    20     17
9  2-16    d    10      9
10 2-16    e    40     20
11 3-16    a    34      4
12 3-16    b     3    -47
13 3-16    c     2    -18
14 3-16    d     1     -9
15 3-16    e     4    -36

with aggregate

df$newval <- c(aggregate(value ~ text, data=df, FUN=function(x) c(NA,diff(x)))[,-1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM