根据条件从上一组中选择值

Question

I have the following df我有以下 df

df<-data.frame(value = c(1,1,1,2,1,1,2,2,1,2),
              group = c(5,5,5,6,7,7,8,8,9,10),
             no_rows = c(3,3,3,1,2,2,2,2,1,1))

where identical consecutive values form a group, ie, values in rows 1:3 fall under group 5. Column "no_rows" tells us how many rows/entries each group has, ie, group 5 has 3 rows/entries.其中相同的连续值形成一个组，即行 1:3 中的值属于组 5。列“no_rows”告诉我们每个组有多少行/条目，即组 5 有 3 行/条目。

I am trying to substitute all values, where no_rows < 2, with the value from a previous group.我试图用前一组的值替换所有值，其中 no_rows < 2。 I expect my end df to look like this:我希望我的最终 df 看起来像这样：

df_end<-data.frame(value = c(1,1,1,1,1,1,2,2,2,2),
              group = c(5,5,5,6,7,7,8,8,9,10),
             no_rows = c(3,3,3,1,2,2,2,2,1,1))

I came up with this combination of if...else in a for loop, which gives me the desired output, however it is very slow and I am looking for a way to optimise it.我在 for 循环中提出了 if...else 的这种组合，它为我提供了所需的输出，但是它非常慢，我正在寻找一种优化它的方法。

  for (i in 2:length(df$group)){
    if (df$no_rows[i] < 2){
      df$value[i] <- df$value[i-1]
    } 
 }

I have also tried with dplyr::mutate and lag() but it does not give me the desired output (it only removes the first value per group instead of taking the value of a previous group).我也尝试过 dplyr::mutate 和 lag() 但它没有给我想要的输出（它只删除每组的第一个值，而不是取前一组的值）。

  df<-df%>%
    group_by(group) %>%
    mutate(value = ifelse(no_rows < 2, lag(value), value))

I looked for a solution now for a few days but I could not find anything that fit my problem completly.我现在已经寻找了几天的解决方案，但我找不到任何完全适合我的问题的东西。 Any ideas?有任何想法吗？

Answer 1

a data.table approach...数据表方法...

first, get the values of groups with length >=2, then fill in missing values (NA) by last-observation-carried-forward.首先，获取长度>=2的组的值，然后通过last-observation-carried-forward填充缺失值（NA）。

library(data.table)
# make it a data.table
setDT(df, key = "group")
# get values for groups of no_rows >= 2
df[no_rows >= 2, new_value := value][]
#    value group no_rows new_value
# 1:     1     5       3         1
# 2:     1     5       3         1
# 3:     1     5       3         1
# 4:     2     6       1        NA
# 5:     1     7       2         1
# 6:     1     7       2         1
# 7:     2     8       2         2
# 8:     2     8       2         2
# 9:     1     9       1        NA
#10:     2    10       1        NA

# fill down missing values in new_value
setnafill(df, "locf", cols = c("new_value"))
#    value group no_rows new_value
# 1:     1     5       3         1
# 2:     1     5       3         1
# 3:     1     5       3         1
# 4:     2     6       1         1
# 5:     1     7       2         1
# 6:     1     7       2         1
# 7:     2     8       2         2
# 8:     2     8       2         2
# 9:     1     9       1         2
#10:     2    10       1         2

根据条件从上一组中选择值

问题描述

1 个解决方案

解决方案1
1 2021-11-08 11:06:10

根据条件从上一组中选择值

问题描述

1 个解决方案

解决方案1 1 2021-11-08 11:06:10

解决方案1
1 2021-11-08 11:06:10