[英]R - Use regex to remove all strings, special characters, and pattern ending element
Say I have a character vector ids
as follows: 假设我有一个字符向量ids
,如下所示:
ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")
I want to search each element and remove all letters, all special characters, and "01" when it ends the element. 我想搜索每个元素并删除所有字母,所有特殊字符,并在结束元素时删除“01”。 So ids
would become: 所以ids
会变成:
ids_replaced <- c("3670250", "3417960", "1301692", "130250509", "1300699551")
I'm coming out somewhat close, but it hasn't worked as I've intended it to. 我有点接近,但它没有按照我的意图行事。
gsub("(.*?)(\\d+?)(01$)", "\\2", ids, perl = TRUE)
You could use 你可以用
gsub("01$|\\D", "", ids)
# [1] "3670250" "3417960" "1301692" "130250509" "1300699551"
identical(gsub("01$|\\D", "", ids), ids_replaced)
# [1] TRUE
Regular Expression Explanation: 正则表达式说明:
01
matches "01" 01
匹配“01” $
before an optional \\n
, and the end of the string $
之前的可选\\n
和字符串的结尾 |
OR 要么 \\D
matches non-digits (all but 0-9) \\D
匹配非数字(除了0-9之外) Using rex may make this type of task a little simpler. 使用rex可以使这种类型的任务更简单一些。
ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")
re_substitutes(ids,
rex(non_digits %or% list("01", end)),
'',
global = TRUE)
#> [1] "3670250" "3417960" "1301692" "130250509" "1300699551"
I'm not sure how to do it in R but you can use this regex: 我不知道如何在R中做到这一点,但你可以使用这个正则表达式:
-\d+$|\D
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.