[英]Issue with R if else loop: Conditions only partly executed
I have the following data frame:我有以下数据框:
Row Repro Number2
1 1 EWC
2 NA LWY
3 7 EWS
4 NA LWC
5 NA EWC
6 NA LWC
7 3 EWY
8 NA LW2Y
9 NA Unknown
10 NA LWC
11 1 EWC
12 NA LWY
13 NA EWY
14 NA LWY
15 NA Unknown
16 NA LWC
On this data frame, I am using the following loop:在此数据框上,我使用以下循环:
for (i in 1:nrow(df3)) {
if(df3$Number2[i+1]=="Unknown" & is.na(df3$Repro[i])) {
df3$Number2[i]="Unknown"
} else{
df3$Number2[i]==df3$Number2[i]
}
}
While the loop does run, I get an error code at the end and the data frame ends up not looking like the result I want.当循环确实运行时,我在最后收到一个错误代码,并且数据框最终看起来不像我想要的结果。
My issue is that while the code is carrying out its intended purpose (replacing values in the number2 column with "Unknown" if the value after it is also "Unknown" and the associated Repro value is NA), it is only doing it with "Unknown" values that are initially in the datafreame.我的问题是,虽然代码正在执行其预期目的(如果 number2 列中的值也是“未知”并且关联的 Repro 值为 NA,则将 number2 列中的值替换为“未知”),它只是用“最初在数据帧中的未知”值。 I want it to also take into account the new "Unknowns" added and carry out the loop conditions with those too.
我希望它也考虑到添加的新“Unknowns”并执行循环条件。
Here is the error code:这是错误代码:
Error in if (df3$Number2[i + 1] == "Unknown" & is.na(df3$Repro[i])) { :
missing value where TRUE/FALSE needed
And here is the data frame after running the loop.这是运行循环后的数据框。 I have added another column called "Number2.Correct" showing what I want the Number2 column to actually look like.
我添加了另一个名为“Number2.Correct”的列,显示了我希望 Number2 列的实际外观。 The issue is with the rows 12 and 13 - These should be "Unknowns" and not "LWY" and "EWY", respectively.
问题出在第 12 行和第 13 行 - 这些应该分别是“Unknowns”而不是“LWY”和“EWY”。
Repro Number2 Number2.Correct
1 1 EWC EWC
2 NA LWY LWY
3 7 EWS EWS
4 NA LWC LWC
5 NA EWC EWC
6 NA LWC LWC
7 3 EWY EWY
8 NA Unknown Unknown
9 NA Unknown Unknown
10 NA LWC LWC
11 1 EWC EWC
12 NA LWY Unknown
13 NA EWY Unknown
14 NA Unknown Unknown
15 NA Unknown Unknown
16 NA LWC LEW
In the end, I have two questions:最后,我有两个问题:
for (i in rev(1:nrow(df3))) {
if (df3$Number2[i + 1] == "Unknown" & is.na(df3$Repro[i]) & i + 1 < nrow(df3)) {
df3$Number2[i] <- "Unknown"
} else {
df3$Number2[i] == df3$Number2[i]
}
}
df3
#> Row Repro Number2
#> 1 1 1 EWC
#> 2 2 NA LWY
#> 3 3 7 EWS
#> 4 4 NA LWC
#> 5 5 NA EWC
#> 6 6 NA LWC
#> 7 7 3 EWY
#> 8 8 NA Unknown
#> 9 9 NA Unknown
#> 10 10 NA LWC
#> 11 11 1 EWC
#> 12 12 NA Unknown
#> 13 13 NA Unknown
#> 14 14 NA Unknown
#> 15 15 NA Unknown
#> 16 16 NA LWC
Created on 2023-01-09 with reprex v2.0.2 You had two issues:创建于 2023-01-09,使用reprex v2.0.2您有两个问题:
i + 1
is out of range for the final row in your data; i + 1
超出数据最后一行的范围; I added another condition ( i + 1 < nrow(df3)
)i + 1 < nrow(df3)
)Unknown
from bottom to top, not top to bottom.Unknown
。 You can reverse the order with rev()
rev()
反转顺序The i+1
goes out of range after the nrow
of the data.第
nrow
行数据后i+1
超出范围。 We may use a group by approach with tidyverse
我们可以通过
tidyverse
的方法使用分组
library(dplyr)
library(tidyr)
library(data.table)
df3 %>%
mutate(grp = replace(replace(Number2, Number2 != "Unknown", NA),
Number2 == "Unknown", seq_len(sum(Number2 == "Unknown")))) %>%
fill(grp, .direction = "updown") %>%
group_by(grp, grp2 = rleid(is.na(Repro))) %>%
mutate(Number2 = case_when(is.na(Repro) &
row_number() < match("Unknown", Number2) ~ "Unknown",
TRUE ~ Number2)) %>%
ungroup %>%
select(-grp, -grp2)
-output -输出
# A tibble: 16 × 3
Row Repro Number2
<int> <int> <chr>
1 1 1 EWC
2 2 NA LWY
3 3 7 EWS
4 4 NA LWC
5 5 NA EWC
6 6 NA LWC
7 7 3 EWY
8 8 NA Unknown
9 9 NA Unknown
10 10 NA LWC
11 11 1 EWC
12 12 NA Unknown
13 13 NA Unknown
14 14 NA Unknown
15 15 NA Unknown
16 16 NA LWC
df3 <- structure(list(Row = 1:16, Repro = c(1L, NA, 7L, NA, NA, NA,
3L, NA, NA, NA, 1L, NA, NA, NA, NA, NA), Number2 = c("EWC", "LWY",
"EWS", "LWC", "EWC", "LWC", "EWY", "LW2Y", "Unknown", "LWC",
"EWC", "LWY", "EWY", "LWY", "Unknown", "LWC")),
class = "data.frame", row.names = c(NA,
-16L))
The reason that the code fails is because nrow(df3)+1
is out of range.代码失败的原因是
nrow(df3)+1
超出范围。 so the for loop needs to be 1:(nrow(df3)-1)
所以 for 循环需要是
1:(nrow(df3)-1)
To iteratively update Number2, one easy way (although not elegant) is to use while loop.要迭代更新 Number2,一种简单的方法(虽然不优雅)是使用 while 循环。 The stopping condition is when new and old
Number2
is the same.停止条件是新旧
Number2
相同时。
while(T){
df3$Number2_new <- df3$Number2
for (i in 1:(nrow(df3)-1)) {
if(df3$Number2_new[i+1]=="Unknown" & is.na(df3$Repro[i])) {
df3$Number2_new[i]="Unknown"
} else{
df3$Number2_new[i]==df3$Number2_new[i]
}
}
if(all(df3$Number2==df3$Number2_new)){
df3 <- df3%>%
mutate(Number2=Number2_new)%>%
select(-Number2_new)
break
}else{
df3 <- df3%>%
mutate(Number2=Number2_new)%>%
select(-Number2_new)
}
}
df3
Row Repro Number2
1 1 1 EWC
2 2 NA LWY
3 3 7 EWS
4 4 NA LWC
5 5 NA EWC
6 6 NA LWC
7 7 3 EWY
8 8 NA Unknown
9 9 NA Unknown
10 10 NA LWC
11 11 1 EWC
12 12 NA Unknown
13 13 NA Unknown
14 14 NA Unknown
15 15 NA Unknown
16 16 NA LWC
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.