简体   繁体   English

使用 LOOP 将单元格中的缺失值替换为上面 (n-1) 单元格中的值

[英]Replace missing values in a cell, with a value from the cell above (n-1) using a LOOP

I have a data file with thousands of rows, that has gaps which I wish to fill with a value.我有一个包含数千行的数据文件,其中有我希望用一个值填充的空白。 I need to replace the empty cells with the values from those above it.我需要用上面的值替换空单元格。 It will be easier to give you an idea of what my data looks like, here is a sample让您了解我的数据是什么样子会更容易,这里有一个示例

Variable <- c("AGE","","","","SEX","","SEGMENT","","","","")    
Value <- c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)
Description <- c("18-24","25-34","35-44","45+","Female","Male","A","B","C","D","E")
df <- data.frame(Variable, Value, Description)

> df
   Variable Value Description
1       AGE     1       18-24
2               2       25-34
3               3       35-44
4               4         45+
5       SEX     1      Female
6               2        Male
7   SEGMENT     1           A
8               2           B
9               3           C
10              4           D
11              5           E

As you can see above the first column has gaps.正如您在上面看到的,第一列有间隙。 I need these empty cells to be replaced with the relevant value above so the new variable will look like this in the dataframe我需要将这些空单元格替换为上面的相关值,以便新变量在 dataframe 中看起来像这样

> df
   Variable Value Description Variable_NEW
1       AGE     1       18-24               AGE
2               2       25-34               AGE
3               3       35-44               AGE
4               4         45+               AGE
5       SEX     1      Female               SEX
6               2        Male               SEX
7   SEGMENT     1           A           SEGMENT
8               2           B           SEGMENT
9               3           C           SEGMENT
10              4           D           SEGMENT
11              5           E           SEGMENT

Thinking out aloud.大声思考。 I'm assuming to achieve this, I will need to create a new variable with a loop and then use a logic like this我假设要实现这一点,我需要创建一个带有循环的新变量,然后使用这样的逻辑

    IF Variable[n]="" THEN Variable_New[n] = Variable[n-1], 
               ELSE Variable_New[n] = Variable[n]

I'm familiar with loops but don't how to write this kind of thing in R where it has a lag/n-1 kind of function. There are probably many ways to accomplish this, but it would be a preferable using a loop.我熟悉循环,但不知道如何在 R 中编写这种东西,它有一个 lag/n-1 类型的 function。可能有很多方法可以实现这一点,但最好使用循环. Any help will be greatly appreciated.任何帮助将不胜感激。 Thanks谢谢

Here a loop approach:这是一个循环方法:

#Data
Variable <- c("AGE","","","","SEX","","SEGMENT","","","","")    
Value <- c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)
Description <- c("18-24","25-34","35-44","45+","Female","Male","A","B","C","D","E")
df <- data.frame(Variable, Value, Description,stringsAsFactors = F)
#Create new column
df$NewVar <- df$Variable
#Loop
for(i in 2:dim(df)[1])
{
  df$NewVar[i] <- ifelse(df$NewVar[i]=="",df$NewVar[i-1],df$NewVar[i])
}

Output: Output:

   Variable Value Description  NewVar
1       AGE     1       18-24     AGE
2               2       25-34     AGE
3               3       35-44     AGE
4               4         45+     AGE
5       SEX     1      Female     SEX
6               2        Male     SEX
7   SEGMENT     1           A SEGMENT
8               2           B SEGMENT
9               3           C SEGMENT
10              4           D SEGMENT
11              5           E SEGMENT

You don't need to write loops, there are built-in functions which can help you with this task.您不需要编写循环,有内置函数可以帮助您完成此任务。

You can replace blank values with NA and use fill :您可以用NA replace空白值并使用fill

library(dplyr)

df %>%
  mutate(Variable_NEW = replace(Variable, Variable == "", NA)) %>%
  tidyr::fill(Variable_NEW)

#   Variable Value Description Variable_NEW
#1       AGE     1       18-24          AGE
#2               2       25-34          AGE
#3               3       35-44          AGE
#4               4         45+          AGE
#5       SEX     1      Female          SEX
#6               2        Male          SEX
#7   SEGMENT     1           A      SEGMENT
#8               2           B      SEGMENT
#9               3           C      SEGMENT
#10              4           D      SEGMENT
#11              5           E      SEGMENT

You can write your own function with a loop or use the na.locf function from the zoo package to fill-in missing NA values.您可以使用循环编写自己的 function 或使用zoo package 中的na.locf function 来填充缺失的NA值。 Example:例子:

fillin <- function(x) {
  for (i in 2:length(x)) {
    if (x[i] %in% c(NA, "")) {
      x[i] <- x[i - 1]
    }
  }
  x
}

Variable <- c("AGE","","","","SEX","","SEGMENT","","","","")    
Value <- c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)
Description <- c("18-24","25-34","35-44","45+","Female","Male","A","B","C","D","E")
df <- data.frame(Variable, Value, Description)

df$Variable_fillin <- fillin(df$Variable)

library(zoo)
df$Variable[df$Variable == ""] <- NA
df$Variable_nalocf <- na.locf(df$Variable)

df
#>    Variable Value Description Variable_fillin Variable_nalocf
#> 1       AGE     1       18-24             AGE             AGE
#> 2      <NA>     2       25-34             AGE             AGE
#> 3      <NA>     3       35-44             AGE             AGE
#> 4      <NA>     4         45+             AGE             AGE
#> 5       SEX     1      Female             SEX             SEX
#> 6      <NA>     2        Male             SEX             SEX
#> 7   SEGMENT     1           A         SEGMENT         SEGMENT
#> 8      <NA>     2           B         SEGMENT         SEGMENT
#> 9      <NA>     3           C         SEGMENT         SEGMENT
#> 10     <NA>     4           D         SEGMENT         SEGMENT
#> 11     <NA>     5           E         SEGMENT         SEGMENT

This replaces the "" with missing and then fixes the variable named Variable:这会将 "" 替换为 missing,然后修复名为 Variable 的变量:

df %>% 
  dplyr::mutate_all(list(~na_if(.,""))) %>% 
  tidyr::fill(Variable, .direction = "down")

Using data.table and a for loop:使用 data.table 和 for 循环:

library(data.table)
DT <- as.data.table(df)

DT[, Variable_new := Variable[1]]

for (i in 2:nrow(DT)) {
  DT[i, Variable_new := fifelse(DT[i, Variable] == '', DT[i-1, Variable_new], DT[i, Variable])]
}

> DT
    Variable Value Description Variable_new
 1:      AGE     1       18-24          AGE
 2:              2       25-34          AGE
 3:              3       35-44          AGE
 4:              4         45+          AGE
 5:      SEX     1      Female          SEX
 6:              2        Male          SEX
 7:  SEGMENT     1           A      SEGMENT
 8:              2           B      SEGMENT
 9:              3           C      SEGMENT
10:              4           D      SEGMENT
11:              5           E      SEGMENT

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM