[英]Filter the values in a variable in a dataframe which match a regular expression using grep in R
I have data which looks like this我有看起来像这样的数据
data <- data.frame(
ID_num = c("BGR9876", "BNG3421", "GTH4567", "YOP9824", "Child 1", "2JAZZ", "TYH7654"),
date_created = "19/07/1983"
)
I would like to filter the dataframe so that I only keep the rows where ID_num follows the pattern ABC1234.我想过滤数据框,以便只保留 ID_num 遵循模式 ABC1234 的行。 I am new to using regular expressions in grep, and I am getting this wrong.
我是在 grep 中使用正则表达式的新手,我弄错了。 This is what I am trying
这就是我正在尝试的
data_clean <- data %>%
filter(grep("[A-Z]{3}[1:9]{4}", ID_num))
Which gives me the error Error in filter_impl(.data, quo) : Argument 2 filter condition does not evaluate to a logical vector
这给了我
Error in filter_impl(.data, quo) : Argument 2 filter condition does not evaluate to a logical vector
的错误Error in filter_impl(.data, quo) : Argument 2 filter condition does not evaluate to a logical vector
This is my desired output这是我想要的输出
data_clean <- data.frame(
ID_num = c("BGR9876", "BNG3421", "GTH4567", "YOP9824", "TYH7654"),
date_created = "19/07/1983"
)
Thanks谢谢
The 1:9
should be 1-9
and it would be grepl
along with ^
to specify the start of the string and $
for the end of the string 1:9
应该是1-9
并且grepl
和^
一起指定字符串的开头和$
指定字符串的结尾
library(dplyr)
data %>%
filter(grepl("^[A-Z]{3}[1-9]{4}$", ID_num))
# ID_num date_created
#1 BGR9876 19/07/1983
#2 BNG3421 19/07/1983
#3 GTH4567 19/07/1983
#4 YOP9824 19/07/1983
#5 TYH7654 19/07/1983
filter
expects a logical vector, grep
returns numeric index while grepl
return logical vector filter
需要一个逻辑向量, grep
返回数字索引,而grepl
返回逻辑向量
Or if we want to use grep
, use slice
which expects numeric index或者,如果我们想使用
grep
,请使用需要数字索引的slice
data %>%
slice(grep("^[A-Z]{3}[1-9]{4}$", ID_num))
A similar option in tidyverse
would be to use str_detect
tidyverse
一个类似选项是使用str_detect
library(stringr)
data %>%
filter(str_detect(ID_num, "^[A-Z]{3}[1-9]{4}$"))
In base R
, we can do在
base R
,我们可以做
subset(data, grepl("^[A-Z]{3}[1-9]{4}$", ID_num))
Or with Extract
或使用
Extract
data[grepl("^[A-Z]{3}[1-9]{4}$", data$ID_num),]
Note that this will specifically find the pattern of 3 upper case letters followed by 4 digits, and not match请注意,这将专门查找 3 个大写字母后跟 4 个数字的模式,并且不匹配
grepl("[A-Z]{3}[1-9]{4}", "ABGR9876923")
#[1] TRUE
grepl("^[A-Z]{3}[1-9]{4}$", "ABGR9876923")
#[1] FALSE
We can use grepl
with the pattern我们可以将
grepl
与模式一起使用
data[grepl("[A-Z]{3}\\d{4}", data$ID_num), ]
# ID_num date_created
#1 BGR9876 19/07/1983
#2 BNG3421 19/07/1983
#3 GTH4567 19/07/1983
#4 YOP9824 19/07/1983
#7 TYH7654 19/07/1983
Or in filter
或者在
filter
library(dplyr)
data %>% filter(grepl("[A-Z]{3}\\d{4}", ID_num))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.