[英]Replace missing values in a cell, with a value from the cell above (n-1) using a LOOP
I have a data file with thousands of rows, that has gaps which I wish to fill with a value.我有一个包含数千行的数据文件,其中有我希望用一个值填充的空白。 I need to replace the empty cells with the values from those above it.
我需要用上面的值替换空单元格。 It will be easier to give you an idea of what my data looks like, here is a sample
让您了解我的数据是什么样子会更容易,这里有一个示例
Variable <- c("AGE","","","","SEX","","SEGMENT","","","","")
Value <- c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)
Description <- c("18-24","25-34","35-44","45+","Female","Male","A","B","C","D","E")
df <- data.frame(Variable, Value, Description)
> df
Variable Value Description
1 AGE 1 18-24
2 2 25-34
3 3 35-44
4 4 45+
5 SEX 1 Female
6 2 Male
7 SEGMENT 1 A
8 2 B
9 3 C
10 4 D
11 5 E
As you can see above the first column has gaps.正如您在上面看到的,第一列有间隙。 I need these empty cells to be replaced with the relevant value above so the new variable will look like this in the dataframe
我需要将这些空单元格替换为上面的相关值,以便新变量在 dataframe 中看起来像这样
> df
Variable Value Description Variable_NEW
1 AGE 1 18-24 AGE
2 2 25-34 AGE
3 3 35-44 AGE
4 4 45+ AGE
5 SEX 1 Female SEX
6 2 Male SEX
7 SEGMENT 1 A SEGMENT
8 2 B SEGMENT
9 3 C SEGMENT
10 4 D SEGMENT
11 5 E SEGMENT
Thinking out aloud.大声思考。 I'm assuming to achieve this, I will need to create a new variable with a loop and then use a logic like this
我假设要实现这一点,我需要创建一个带有循环的新变量,然后使用这样的逻辑
IF Variable[n]="" THEN Variable_New[n] = Variable[n-1],
ELSE Variable_New[n] = Variable[n]
I'm familiar with loops but don't how to write this kind of thing in R where it has a lag/n-1 kind of function. There are probably many ways to accomplish this, but it would be a preferable using a loop.我熟悉循环,但不知道如何在 R 中编写这种东西,它有一个 lag/n-1 类型的 function。可能有很多方法可以实现这一点,但最好使用循环. Any help will be greatly appreciated.
任何帮助将不胜感激。 Thanks
谢谢
Here a loop approach:这是一个循环方法:
#Data
Variable <- c("AGE","","","","SEX","","SEGMENT","","","","")
Value <- c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)
Description <- c("18-24","25-34","35-44","45+","Female","Male","A","B","C","D","E")
df <- data.frame(Variable, Value, Description,stringsAsFactors = F)
#Create new column
df$NewVar <- df$Variable
#Loop
for(i in 2:dim(df)[1])
{
df$NewVar[i] <- ifelse(df$NewVar[i]=="",df$NewVar[i-1],df$NewVar[i])
}
Output: Output:
Variable Value Description NewVar
1 AGE 1 18-24 AGE
2 2 25-34 AGE
3 3 35-44 AGE
4 4 45+ AGE
5 SEX 1 Female SEX
6 2 Male SEX
7 SEGMENT 1 A SEGMENT
8 2 B SEGMENT
9 3 C SEGMENT
10 4 D SEGMENT
11 5 E SEGMENT
You don't need to write loops, there are built-in functions which can help you with this task.您不需要编写循环,有内置函数可以帮助您完成此任务。
You can replace
blank values with NA
and use fill
:您可以用
NA
replace
空白值并使用fill
:
library(dplyr)
df %>%
mutate(Variable_NEW = replace(Variable, Variable == "", NA)) %>%
tidyr::fill(Variable_NEW)
# Variable Value Description Variable_NEW
#1 AGE 1 18-24 AGE
#2 2 25-34 AGE
#3 3 35-44 AGE
#4 4 45+ AGE
#5 SEX 1 Female SEX
#6 2 Male SEX
#7 SEGMENT 1 A SEGMENT
#8 2 B SEGMENT
#9 3 C SEGMENT
#10 4 D SEGMENT
#11 5 E SEGMENT
You can write your own function with a loop or use the na.locf
function from the zoo
package to fill-in missing NA
values.您可以使用循环编写自己的 function 或使用
zoo
package 中的na.locf
function 来填充缺失的NA
值。 Example:例子:
fillin <- function(x) {
for (i in 2:length(x)) {
if (x[i] %in% c(NA, "")) {
x[i] <- x[i - 1]
}
}
x
}
Variable <- c("AGE","","","","SEX","","SEGMENT","","","","")
Value <- c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 5)
Description <- c("18-24","25-34","35-44","45+","Female","Male","A","B","C","D","E")
df <- data.frame(Variable, Value, Description)
df$Variable_fillin <- fillin(df$Variable)
library(zoo)
df$Variable[df$Variable == ""] <- NA
df$Variable_nalocf <- na.locf(df$Variable)
df
#> Variable Value Description Variable_fillin Variable_nalocf
#> 1 AGE 1 18-24 AGE AGE
#> 2 <NA> 2 25-34 AGE AGE
#> 3 <NA> 3 35-44 AGE AGE
#> 4 <NA> 4 45+ AGE AGE
#> 5 SEX 1 Female SEX SEX
#> 6 <NA> 2 Male SEX SEX
#> 7 SEGMENT 1 A SEGMENT SEGMENT
#> 8 <NA> 2 B SEGMENT SEGMENT
#> 9 <NA> 3 C SEGMENT SEGMENT
#> 10 <NA> 4 D SEGMENT SEGMENT
#> 11 <NA> 5 E SEGMENT SEGMENT
This replaces the "" with missing and then fixes the variable named Variable:这会将 "" 替换为 missing,然后修复名为 Variable 的变量:
df %>%
dplyr::mutate_all(list(~na_if(.,""))) %>%
tidyr::fill(Variable, .direction = "down")
Using data.table and a for loop:使用 data.table 和 for 循环:
library(data.table)
DT <- as.data.table(df)
DT[, Variable_new := Variable[1]]
for (i in 2:nrow(DT)) {
DT[i, Variable_new := fifelse(DT[i, Variable] == '', DT[i-1, Variable_new], DT[i, Variable])]
}
> DT
Variable Value Description Variable_new
1: AGE 1 18-24 AGE
2: 2 25-34 AGE
3: 3 35-44 AGE
4: 4 45+ AGE
5: SEX 1 Female SEX
6: 2 Male SEX
7: SEGMENT 1 A SEGMENT
8: 2 B SEGMENT
9: 3 C SEGMENT
10: 4 D SEGMENT
11: 5 E SEGMENT
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.