[英]New column value based on column and previous row values in R
I'm having issues to figuring out the code for the following: We have our basic table named as shown below我在找出以下代码时遇到问题:我们的基本表命名如下
Status
is calculate if qty_ordered == qty_delivered
then Status = "D"
else Status == "N"
. Status
是计算如果qty_ordered == qty_delivered
然后Status = "D"
其他Status == "N"
。
For Flag
is where it gets more complicated: Flag == "Yes"
if Status== "D"
, unless there's 2 "N"
s on 2 previous consecutive days as you can see on May 6th for apples.对于
Flag
来说,它变得更加复杂: Flag == "Yes"
if Status== "D"
,除非在 5 月 6 日的苹果中,您可以在连续两天看到 2 个"N"
。
Flag == "N"
if it's at least 2 consecutive days of N, as shown for apple in May4rth. Flag == "N"
如果它是Flag == "N"
的至少连续 2 天,如 5 月 4 日的苹果所示。
Item![]() |
Date![]() |
qty_ordered![]() |
qty_delivered![]() |
Status![]() |
Flag![]() |
---|---|---|---|---|---|
Apple![]() |
1-May ![]() |
100 ![]() |
100 ![]() |
D ![]() |
YES![]() |
Apple![]() |
2-May ![]() |
100 ![]() |
100 ![]() |
D ![]() |
YES![]() |
Apple![]() |
3-May ![]() |
100 ![]() |
70 ![]() |
N ![]() |
YES![]() |
Apple![]() |
4-May ![]() |
100 ![]() |
0 ![]() |
N ![]() |
NO![]() |
Apple![]() |
5-May ![]() |
100 ![]() |
0 ![]() |
N ![]() |
NO![]() |
Apple![]() |
6-May ![]() |
100 ![]() |
100 ![]() |
D ![]() |
NO![]() |
Apple![]() |
7-May ![]() |
100 ![]() |
100 ![]() |
D ![]() |
YES![]() |
Banana![]() |
1-May ![]() |
50 ![]() |
50 ![]() |
D ![]() |
YES![]() |
Banana![]() |
2-May ![]() |
50 ![]() |
0 ![]() |
N ![]() |
YES![]() |
Banana![]() |
3-May ![]() |
50 ![]() |
50 ![]() |
D ![]() |
YES![]() |
Banana![]() |
4-May ![]() |
50 ![]() |
50 ![]() |
D ![]() |
YES![]() |
Banana![]() |
5-May ![]() |
50 ![]() |
50 ![]() |
D ![]() |
YES![]() |
I usually do mutate
to calculate new fields such as: df <- mutate(df,Flag= if_else(qty_ordered == qty_delivered, "YES","NO"))
我通常使用
mutate
来计算新字段,例如: df <- mutate(df,Flag= if_else(qty_ordered == qty_delivered, "YES","NO"))
but this doesn't includes the validation if the previous days that the problem needs.但这不包括问题需要的前几天的验证。
Any help would be appreciated.任何帮助,将不胜感激。
You can use lag
to refer to previous values.您可以使用
lag
来引用以前的值。 Try -尝试 -
library(dplyr)
df %>%
mutate(flag = case_when(lag(Status) == 'N' & lag(Status, 2) == 'N' ~ 'NO',
Status == 'D' | lag(Status) == 'D' ~ 'YES',
TRUE ~ 'NO'))
# Item Date qty_ordered qty_delivered Status Flag
#1 Apple 1-May 100 100 D YES
#2 Apple 2-May 100 100 D YES
#3 Apple 3-May 100 70 N YES
#4 Apple 4-May 100 0 N NO
#5 Apple 5-May 100 0 N NO
#6 Apple 6-May 100 100 D NO
#7 Apple 7-May 100 100 D YES
#8 Banana 1-May 50 50 D YES
#9 Banana 2-May 50 0 N YES
#10 Banana 3-May 50 50 D YES
#11 Banana 4-May 50 50 D YES
#12 Banana 5-May 50 50 D YES
You may want to add group_by(Item)
to do this separately for each Item
.您可能需要添加
group_by(Item)
来为每个Item
单独执行此操作。
Does this work:这是否有效:
library(dplyr)
df %>% group_by(Item) %>% mutate(Flag = case_when(Status == 'N' & lag(Status == 'N') ~ 'NO', TRUE ~ 'YES'))
# A tibble: 12 x 6
# Groups: Item [2]
Item Date qty_ordered qty_delivered Status Flag
<chr> <chr> <int> <int> <chr> <chr>
1 Apple 1-May 100 100 D YES
2 Apple 2-May 100 100 D YES
3 Apple 3-May 100 70 N YES
4 Apple 4-May 100 0 N NO
5 Apple 5-May 100 0 N NO
6 Apple 6-May 100 100 D YES
7 Apple 7-May 100 100 D YES
8 Banana 1-May 50 50 D YES
9 Banana 2-May 50 0 N YES
10 Banana 3-May 50 50 D YES
11 Banana 4-May 50 50 D YES
12 Banana 5-May 50 50 D YES
You can do this in base R with您可以在基础 R 中执行此操作
# create a data.frame with only the relevant columns
dat <- data.frame(Item = c(rep("Apple", 7), rep("Banana", 5)),
Status = c("D", "D", "N", "N", "N", "D", "D",
"D", "N", "D", "D", "D"))
# create the flag column
transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
ifelse(c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N), "NO", "YES")))
#R> Item Status Flag
#R> 1 Apple D YES
#R> 2 Apple D YES
#R> 3 Apple N YES
#R> 4 Apple N NO
#R> 5 Apple N NO
#R> 6 Apple D NO
#R> 7 Apple D YES
#R> 8 Banana D YES
#R> 9 Banana N YES
#R> 10 Banana D YES
#R> 11 Banana D YES
#R> 12 Banana D YES
A faster variant is一个更快的变体是
transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
c("YES", "NO")[
1L + (c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N))]))
and here is a small simulation study这是一个小型的模拟研究
# perform a benchmark study with simulated data
library(dplyr)
set.seed(1)
n_lvls <- 1000L
n_per_lvl <- 6L
dat <- data.frame(
Item = as.character(gl(n = n_lvls, n_per_lvl)),
Status = sample(c("D", "N"), replace = TRUE, n_per_lvl * n_lvls))
bench::mark(
first = transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
ifelse(c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N), "NO", "YES"))),
faster = transform(dat, Flag = ave(Status == "N", Item, FUN = function(is_N)
c("YES", "NO")[
1L + (c(F, head(is_N, -1)) & (c(F, F, head(is_N, -2)) | is_N))])),
dplyr = dat %>% group_by(Item) %>%
mutate(flag = case_when(lag(Status) == 'N' & lag(Status, 2) == 'N' ~ 'NO',
Status == 'D' | lag(Status) == 'D' ~ 'YES',
TRUE ~ 'NO')), check = FALSE)
#R> # A tibble: 3 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#R> 1 first 16.4ms 17.1ms 53.6 511KB 15.9 27 8 504ms <NULL> <Rprofmem [34 × 3]> <bench_tm [27… <tibble [27 × 3…
#R> 2 faster 12.1ms 14ms 58.4 511KB 15.6 30 8 514ms <NULL> <Rprofmem [34 × 3]> <bench_tm [30… <tibble [30 × 3…
#R> 3 dplyr 196ms 196.7ms 5.03 333KB 15.1 3 9 596ms <NULL> <Rprofmem [45 × 3]> <bench_tm [3]> <tibble [3 × 3]>
where dplyr
is 196 / 12.1 ~ 16 times slower.其中
dplyr
慢 196 / 12.1 ~ 16 倍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.