[英]A regular expression to filter out repetitive numbers for R data frame columns
I have a data frame with many columns and rows and I need to filter based on the value of two columns (Lat and Lon).我有一个包含许多列和行的数据框,我需要根据两列(Lat 和 Lon)的值进行过滤。 I need a regular expression which
我需要一个正则表达式
Type <-c("human","camera","ebird","museum", "specimen", "gbif")
Lat <- c(34.67, 34.66,34.6666666, 34.666582, 34.56666, 34.586666)
Lon <- c(9.888,9.88,9.8761,9.888064, 9.78888,9.318888)
x = data.frame(cbind(Type,Lat,Lon))
Here's how each row would fare under the regex:以下是每行在正则表达式下的表现:
So the resulting data frame from this regex filter would be:因此,此正则表达式过滤器生成的数据帧将是:
Type <-c("museum","gbif")
Lat <- c(34.666582, 34.586666)
Lon <- c(9.888064, 9.318888)
x = data.frame(cbind(Type,Lat,Lon))
The function below will output the desired dataframe that you want.下面的 function 将 output 您想要的 dataframe。 It accomplishes all of the requirements you stated above.
它完成了您上面提到的所有要求。
check.expressions <- function(data){
data$pass <- FALSE
for(i in 1:nrow(data)){
if(nchar(str_extract(x$Lon[i], "(?<=\\.).*")) < 3 | nchar(str_extract(x$Lat[i], "(?<=\\.).*")) < 3){
next
} else {
unlist(str_split(str_extract(x$Lon[i], "(?<=\\.).*" ), "")) -> lon
unlist(str_split(str_extract(x$Lat[i], "(?<=\\.).*" ), "")) -> lat
if(lon[1] == lon[2] && lon[2] == lon[3]){
if(length(lon) > 3){
if(lon[3] != lon[length(lon)]){
data$pass[i] <- TRUE
next
} else {
next
}
}
next
}
if(lat[1] == lat[2] && lat[2] == lat[3]){
if(length(lat) > 3){
if(lat[3] != lat[length(lat)]){
data$pass[i] <- TRUE
next
} else {
next
}
}
next
}
if(length(lon) > 4){
if(lon[2] == lon[3] && lon[3] == lon[4]){
if(lon[4] != lon[length(lon)]){
data$pass[i] <- TRUE
next
} else {
next
}
}
}
if(length(lat) > 4){
if(lat[2] == lat[3] && lat[3] == lat[4]){
if(lat[4] != lat[length(lat)]){
data$pass[i] <- TRUE
next
}
}
}
data$pass[i] <- TRUE
}
}
data[data$pass == TRUE, ] -> data
return(data)
}
The function call being just: function 调用只是:
check.expressions(x) -> x.out
which would produce:这将产生:
> check.expressions(x) -> x.out
> x.out
Type Lat Lon pass
4 museum 34.666582 9.888064 TRUE
6 gbif 34.586666 9.318888 TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.