[英]Changing NA Values based on cell values in same column in R
V1 <- c("Name", "Paul", "Name", "Sarah", NA, NA, NA, NA, "Name", "Carl", NA, NA, "Name", "Alice", "Name", "Rita")
V2 <- c("Name", "Paul", "Name", "Sarah", "Name", "Sarah", "Name", "Sarah", "Name", "Carl", "Name", "Carl", "Name", "Alice", "Name", "Rita")
df <- data.frame(V1, V2)
df
I would like V1 to look like V2.我希望 V1 看起来像 V2。 EDIT: In the original dataset, V2 doesnt exist, I created it here to give some example data.
编辑:在原始数据集中,V2 不存在,我在这里创建它以提供一些示例数据。
V1 V2
1 Name Name
2 Paul Paul
3 Name Name
4 Sarah Sarah
5 <NA> Name
6 <NA> Sarah
7 <NA> Name
8 <NA> Sarah
9 Name Name
10 Carl Carl
11 <NA> Name
12 <NA> Carl
13 Name Name
14 Alice Alice
15 Name Name
16 Rita Rita
I tried the following:我尝试了以下方法:
#find the positions of missings in V1
m <- which(is.na(df$V1) == TRUE)
m
[1] 5 6 7 8 11 12
#go to every position and change the value depending on the field that is 2 field above the missing
for (i in m) {
df$V1[m[i]] <- df$V1[m[i]-2]
}
The output is working, but its faulty:输出工作正常,但有问题:
V1 V2
1 Name Name
2 Paul Paul
3 Name Name
4 Sarah Sarah
5 <NA> Name
6 <NA> Sarah
7 <NA> Name
8 <NA> Sarah
9 Name Name
10 Carl Carl
11 Name Name
12 Carl Carl
13 Name Name
14 Alice Alice
15 Name Name
16 Rita Rita
Why is it working for the other cells but not the first incident?为什么它适用于其他细胞而不是第一个事件? Also, I'm trying to avoid for loops, so if there is a more elegant way to do it, I would love to see one!
此外,我试图避免循环,所以如果有更优雅的方法来做到这一点,我很想看到一个!
Since your for
loop is looping over m
, you could directly do由于您的
for
循环在m
上循环,您可以直接执行
m <- which(is.na(df$V1))
for (i in m) df$V1[i] <- df$V1[i-2]
df
# V1 V2
#1 Name Name
#2 Paul Paul
#3 Name Name
#4 Sarah Sarah
#5 Name Name
#6 Sarah Sarah
#7 Name Name
#8 Sarah Sarah
#9 Name Name
#10 Carl Carl
#11 Name Name
#12 Carl Carl
#13 Name Name
#14 Alice Alice
#15 Name Name
#16 Rita Rita
One option involving dplyr
and tidyr
could be:涉及
dplyr
和tidyr
一种选择可能是:
df %>%
fill(V1) %>%
group_by(rleid = with(rle(V1), rep(seq_along(lengths), lengths))) %>%
mutate(V1 = ifelse(row_number() %% 2 == 0 , "Name", V1)) %>%
ungroup() %>%
select(-rleid)
V1 V2
<chr> <chr>
1 Name Name
2 Paul Paul
3 Name Name
4 Sarah Sarah
5 Name Name
6 Sarah Sarah
7 Name Name
8 Sarah Sarah
9 Name Name
10 Carl Carl
11 Name Name
12 Carl Carl
13 Name Name
14 Alice Alice
15 Name Name
16 Rita Rita
Here is a base R solution, where you use matrix
to reformulate the problem:这是一个基本的 R 解决方案,您可以在其中使用
matrix
来重新表述问题:
df$V2 <- as.vector(t(apply(matrix(df$V1,nrow = 2), 1, function(x) x[!is.na(x)][cumsum(!is.na(x))])))
such that以至于
> df
V1 V2
1 Name Name
2 Paul Paul
3 Name Name
4 Sarah Sarah
5 <NA> Name
6 <NA> Sarah
7 <NA> Name
8 <NA> Sarah
9 Name Name
10 Carl Carl
11 <NA> Name
12 <NA> Carl
13 Name Name
14 Alice Alice
15 Name Name
16 Rita Rita
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.