简体   繁体   English

根据R中同一列中的单元格值更改NA值

[英]Changing NA Values based on cell values in same column in R

V1 <- c("Name", "Paul", "Name", "Sarah", NA, NA, NA, NA, "Name", "Carl", NA, NA, "Name", "Alice", "Name", "Rita")
V2 <- c("Name", "Paul", "Name", "Sarah", "Name", "Sarah", "Name", "Sarah", "Name", "Carl", "Name", "Carl", "Name", "Alice", "Name", "Rita")
df <- data.frame(V1, V2)
df

I would like V1 to look like V2.我希望 V1 看起来像 V2。 EDIT: In the original dataset, V2 doesnt exist, I created it here to give some example data.编辑:在原始数据集中,V2 不存在,我在这里创建它以提供一些示例数据。

      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  <NA>  Name
12  <NA>  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita 

I tried the following:我尝试了以下方法:

#find the positions of missings in V1 
m <- which(is.na(df$V1) == TRUE)
m
[1]  5  6  7  8 11 12

#go to every position and change the value depending on the field that is 2 field above the missing
for (i in m) {
  df$V1[m[i]] <- df$V1[m[i]-2]
}

The output is working, but its faulty:输出工作正常,但有问题:

      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  Name  Name
12  Carl  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita

Why is it working for the other cells but not the first incident?为什么它适用于其他细胞而不是第一个事件? Also, I'm trying to avoid for loops, so if there is a more elegant way to do it, I would love to see one!此外,我试图避免循环,所以如果有更优雅的方法来做到这一点,我很想看到一个!

Since your for loop is looping over m , you could directly do由于您的for循环在m上循环,您可以直接执行

m <- which(is.na(df$V1))
for (i in m) df$V1[i] <- df$V1[i-2]
df

#      V1    V2
#1   Name  Name
#2   Paul  Paul
#3   Name  Name
#4  Sarah Sarah
#5   Name  Name
#6  Sarah Sarah
#7   Name  Name
#8  Sarah Sarah
#9   Name  Name
#10  Carl  Carl
#11  Name  Name
#12  Carl  Carl
#13  Name  Name
#14 Alice Alice
#15  Name  Name
#16  Rita  Rita

One option involving dplyr and tidyr could be:涉及dplyrtidyr一种选择可能是:

df %>%
 fill(V1) %>%
 group_by(rleid = with(rle(V1), rep(seq_along(lengths), lengths))) %>%
 mutate(V1 = ifelse(row_number() %% 2 == 0 , "Name", V1)) %>%
 ungroup() %>%
 select(-rleid)

   V1    V2   
   <chr> <chr>
 1 Name  Name 
 2 Paul  Paul 
 3 Name  Name 
 4 Sarah Sarah
 5 Name  Name 
 6 Sarah Sarah
 7 Name  Name 
 8 Sarah Sarah
 9 Name  Name 
10 Carl  Carl 
11 Name  Name 
12 Carl  Carl 
13 Name  Name 
14 Alice Alice
15 Name  Name 
16 Rita  Rita 

Here is a base R solution, where you use matrix to reformulate the problem:这是一个基本的 R 解决方案,您可以在其中使用matrix来重新表述问题:

df$V2 <- as.vector(t(apply(matrix(df$V1,nrow = 2), 1, function(x) x[!is.na(x)][cumsum(!is.na(x))])))

such that以至于

> df
      V1    V2
1   Name  Name
2   Paul  Paul
3   Name  Name
4  Sarah Sarah
5   <NA>  Name
6   <NA> Sarah
7   <NA>  Name
8   <NA> Sarah
9   Name  Name
10  Carl  Carl
11  <NA>  Name
12  <NA>  Carl
13  Name  Name
14 Alice Alice
15  Name  Name
16  Rita  Rita

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM