按列分组的数据框上R中的行之间的差异

Question

I'm looking to get the difference in counts by version by app_name. 我正在寻找由app_name按版本进行计数的差异。 My dataset looks like this: app_name, version_id, count, [difference] 我的数据集如下所示：app_name，version_id，count，[difference]

Here is the dataset 这是数据集

    data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L)), .Names = c("app_name", "version_id", 
"count"), class = "data.frame", row.names = c(NA, -9L))

Given this data.frame, how can I get the lagged difference in count by both app_name & version_id? 给定此data.frame，我如何才能获得app_name和version_id计数的滞后差异？ the initial (first) version diff for each app would be zero, since there would be no difference. 每个应用程序的初始（第一）版本差异为零，因为两者之间没有差异。

Here is an example of what the final results would look like with that final 'diff' column 这是最终的“ diff”列的最终结果的示例

structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L), diff = c(0, 20, 0, 0, 0, 1.25, 
0, 0, 2.4)), .Names = c("app_name", "version_id", "count", "diff"
), class = "data.frame", row.names = c(NA, -9L))

Answer 1

Try using dplyr and lag : 尝试使用dplyr和lag ：

library(dplyr)
data %>% group_by(app_name) %>%
         mutate(diffvers = version_id - dplyr::lag(version_id, default = version_id[1]),
                diffcount = count - dplyr::lag(count, default = count[1]))

Source: local data frame [9 x 5]
Groups: app_name [3]

  app_name version_id count diffvers diffcount
    (fctr)      (dbl) (int)    (dbl)     (int)
1        a        1.0   600      0.0         0
2        a        1.1   620      0.1        20
3        a        2.3   620      1.2         0
4        b        2.0   200      0.0         0
5        b        3.1   200      1.1         0
6        b        3.3   250      0.2        50
7        b        4.0   250      0.7         0
8        c        1.1    15      0.0         0
9        c        2.4    36      1.3        21

Answer 2

We could use data.table . 我们可以使用data.table 。 We convert the 'data.frame' to 'data.table' ( setDT(data) ), grouped by 'app_name', loop ( lapply(.. ) the columns specified in the .SDcols , get the difference between the current element and its lag ( shift by default has type='lag' ) and assign ( := ) the output to create new columns. 我们的“data.frame”转换为“data.table”（ setDT(data) ），通过“APP_NAME”，循环（分组lapply(.. ）在指定的列.SDcols ，获取当前元素之间的差异，它的lag （默认为shift为type='lag' ）并分配（ := ）输出以创建新列。

library(data.table)#v1.9.6
setDT(data)[, c('diffvers', 'diffcount') := lapply(.SD, 
              function(x) x-shift(x, fill=x[1L])), by = app_name, .SDcols=2:3]

data
#   app_name version_id count diffvers diffcount
#1:        a        1.0   600      0.0         0
#2:        a        1.1   620      0.1        20
#3:        a        2.3   620      1.2         0
#4:        b        2.0   200      0.0         0
#5:        b        3.1   200      1.1         0
#6:        b        3.3   250      0.2        50
#7:        b        4.0   250      0.7         0
#8:        c        1.1    15      0.0         0
#9:        c        2.4    36      1.3        21

按列分组的数据框上R中的行之间的差异

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-10-10 01:43:43

解决方案2
0 2015-10-10 04:48:51

按列分组的数据框上R中的行之间的差异

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-10-10 01:43:43

解决方案2 0 2015-10-10 04:48:51

解决方案1
1 已采纳 2015-10-10 01:43:43

解决方案2
0 2015-10-10 04:48:51