[英]R for loop generating NA's after 1000 iterations
I have a simple for loop that I'm using to remove any rows from my dataframe that involve two variables sharing a similar string and when I run the loop it iterates 1000 times and then starts generating NA's which breaks my loop.我有一个简单的 for 循环,用于从 dataframe 中删除任何行,这些行涉及共享相似字符串的两个变量,当我运行循环时,它会迭代 1000 次,然后开始生成 NA,这会破坏我的循环。
expiration![]() |
quote_datetime![]() |
---|---|
2021-02-26 ![]() |
2021-02-26 10:00:00 ![]() |
2021-02-26 ![]() |
2021-02-27 10:00:00 ![]() |
for(row in 1:nrow(df)){
if(grepl(df$expiration[row], df$quote_datetime[row],fixed=TRUE) == TRUE){
df = df[-row,]
}
}
I'm getting the error message我收到错误消息
Error in if (grepl(df$expiration[row], df$quote_datetime[row], : missing value where TRUE/FALSE needed
if (grepl(df$expiration[row], df$quote_datetime[row], 中的错误:需要 TRUE/FALSE 的地方缺少值
Each time I run it it eliminates a few more rows until it runs out of anything else to eliminate and then it runs without error.每次我运行它时,它都会消除更多的行,直到它用完其他要消除的东西,然后它运行时没有错误。 Appreciate help.
感谢帮助。
The issue arises because the original data 'df' gets subset if
the condition is TRUE, ie it will be one row less for every if
TRUE case.出现问题是因为
if
条件为 TRUE,原始数据 'df' 将获得子集,即对于每个if
为 TRUE 的情况,它将少一行。 It could be resolved if we copy of the data如果我们复制数据就可以解决
df2 <- df
for(row in 1:nrow(df)){
if(grepl(df$expiration[row], df$quote_datetime[row],fixed=TRUE)){
df2 <- df2[-row,]
}
}
Also, grepl
is vectorized only for the 'x' and not for the pattern
So, if we need to do a vectorization, may need to paste
the pattern
together此外,
grepl
仅针对 'x' 而不是pattern
进行矢量化所以,如果我们需要进行矢量化,可能需要将pattern
paste
在一起
df <- df[!grepl(paste(df$expiration, collapse="|"),
df$quote_datetime, fixed = TRUE), ]
Or use a function that does the vectorization for both 'x' and 'pattern ie str_detect
或者使用 function 对 'x' 和 'pattern 即
str_detect
进行矢量化
library(dplyr)
library(stringr)
df %>%
filter(!str_detect(quote_datetime, fixed(expiration))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.