[英]R data.table to calculate new columns from existing columns bases in certain conditions
Let's say I have the following data table: 假设我有以下数据表:
dta <- data.table(
criteria = c('A', 'A', 'B', 'A', 'A', 'B'),
phase = list('block3', c('block1', 'block2'), 'block2', 'block2', 'block3', 'block1'),
start_val = c(12.0, 1.0, 7.0, 7.0, 12.0, 1.0),
end_val = c(15.0, 11.0, 11.0, 11.0, 15.0, 6.0),
max_val = c(13.0, 8.0, 9.5, 11.0, 15.0, 6.0)
)
from which I need the resulting table with two additional column's, cor_start
and cor_end
从中我需要带有两个附加列的结果表cor_start
和cor_end
dtb <- data.table(
criteria = c('A', 'A', 'B', 'A', 'A', 'B'),
phase = list('block3', c('block1', 'block2'), 'block2', 'block2', 'block3', 'block1'),
start_val = c(12.0, 1.0, 7.0, 7.0, 12.0, 1.0),
end_val = c(15.0, 11.0, 11.0, 11.0, 15.0, 6.0),
max_val = c(13.0, 8.0, 9.5, 11.0, 15.0, 6.0),
cor_start = c(12.0, 1.0, 8.0, 9.5, 13.0, 6.0),
cor_end = c(13.0, 8.0, 9.5, 11.0, 15.0, 6.0)
)
the new columns need to be calculated with reference to phases
column by checking if there is any previous row with the current matching phase value. 需要通过检查是否有任何先前的行具有当前匹配的相位值来参考phases
位列来计算新列。
For better understanding, in this example: 为了更好地理解,在此示例中:
however row 1 and row 2 have no previous matching phase rows. 但是第1行和第2行没有先前的匹配阶段行。 Note that the phase
is of type list. 请注意,该phase
是列表类型。
So, when there is a previous matching row, below are the conditions: 因此,当存在上一个匹配行时,以下是条件:
if (max_val in previous matching row is < end_val in current row)
cor_start = previous matching row max_val
cor_end = current row end_val
if (max_val in previous matching row is > end_val in current row)
cor_start = current row end_val
cor_end = current row end_val
and when there is no previous matching row, below are the conditions: 当没有先前的匹配行时,以下是条件:
cor_start = current row start_val
cor_end = current row max_val
I looked into shift(), but could not figure out on how to set the above conditions ? 我调查了shift(),但不知道如何设置上述条件? Thanks! 谢谢!
Something like: 就像是:
dta_transformed <- dta[,.(rn = .I, phase = unlist(phase)), by = setdiff(names(dta), 'phase')][
, shifted_max := shift(max_val), by = phase][
shifted_max < end_val, `:=` (cor_start = shifted_max, cor_end = end_val), by = phase][
shifted_max > end_val, `:=` (cor_start = end_val, cor_end = end_val), by = phase][
is.na(cor_start), `:=` (cor_start = start_val, cor_end = max_val), by = phase][
, phase := paste(phase, collapse = ","), by = rn][!duplicated(rn),][
, c("rn", "shifted_max") := NULL]
However, the output I get is: 但是,我得到的输出是:
criteria phase start_val end_val max_val cor_start cor_end
1: A block3 12 15 13.0 12.0 13
2: A block1,block2 1 11 8.0 1.0 8
3: B block2 7 11 9.5 8.0 11
4: A block2 7 11 11.0 9.5 11
5: A block3 12 15 15.0 13.0 15
6: B block1 1 6 6.0 6.0 6
Could it be that in row number 3 the cor_end
should be 11 in your desired output? 可能是在第3行中,所需输出的cor_end
应该为11吗? As the previous matching row (2) has lower max_val
, therefore the current end_val
(11) should be taken? 由于前一个匹配行(2)的max_val
较低,因此应采用当前end_val
(11)?
Also the tidyverse
approach, slightly more readable: 还有tidyverse
方法,可读性更高:
library(tidyverse)
dta %>% mutate(rn = row_number()) %>%
unnest(phase) %>%
group_by(phase) %>%
mutate(
cor_start = case_when(
lag(max_val) < end_val ~ lag(max_val),
lag(max_val) > end_val ~ end_val,
TRUE ~ start_val
),
cor_end = if_else(!is.na(lag(max_val)), end_val, max_val)
) %>% group_by(rn) %>%
mutate(
phase = paste(phase, collapse = ",")
) %>% ungroup() %>% select(-rn) %>% distinct()
Here is a different approach which uses pmin()
instead of ifelse()
and utilises the fill
parameter of the shift()
function. 这是使用pmin()
代替ifelse()
并利用shift()
函数的fill
参数的另一种方法。 Furthermore, it reduces the number of grouping operations: 此外,它减少了分组操作的数量:
library(data.table)
dta[, rn := .I]
dta[dta[, .(phase2 = unlist(phase)), by = rn], on = "rn"][
, `:=`(cor_start = pmin(shift(max_val, fill = start_val[1]), end_val),
cor_end = max_val), by = phase2][
, .SD[1], by = rn][
, c("rn", "phase2") := NULL][]
criteria phase start_val end_val max_val cor_start cor_end 1: A block3 12 15 13.0 12.0 13.0 2: A block1,block2 1 11 8.0 1.0 8.0 3: B block2 7 11 9.5 8.0 9.5 4: A block2 7 11 11.0 9.5 11.0 5: A block3 12 15 15.0 13.0 15.0 6: B block1 1 6 6.0 6.0 6.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.