[英]replace exact string match with regexp in R
I have a vector of strings that need cleaning. 我有一个需要清洗的字符串向量。 I have been able to clean it quite a lot on my own but I am having problems one thing. 我已经能够自己清理很多东西,但是我遇到一件事。
Some strings have the chain '@56;' 有些字符串的链为“ @ 56;”。 at the beginning (numbers vary). 开头(数字有所不同)。 So a string can be '@56;trousers' or '@897;trousers' I would like to leave it just like 'trousers'. 因此,字符串可以是“ @ 56;裤子”或“ @ 897;裤子”,我想像“裤子”一样保留它。
I have written the following code: 我写了以下代码:
gsub("[@[:digit:];]", "", 'mystring')
but it fails in cases like: 但在以下情况下失败:
gsub("[@[:digit:];]", "", '@34skirt') # returns 'skirt'
I would like it to return '@34skirt' in this case because the ; 我想在这种情况下返回'@ 34skirt',因为 is missing from the end. 从最后开始消失了。
I want a exact match. 我要完全匹配。 Any ideas about how to do this? 有关如何执行此操作的任何想法? I ahve tried to add \\ and it does not work 我试着添加\\,但是它不起作用
The [@[:digit:];]
regex matches a single character that is either a @
, or a digit, or a ;
[@[:digit:];]
正则表达式匹配单个字符,该字符可以是@
或数字,也可以是;
. 。 Thus, it will remove those at any position in the string, as many times as it finds them with gsub
. 因此,它将删除字符串中任意位置的那些字符,与使用gsub
找到它们的次数相同。
You may use a regex defining a sequence of characters to remove, not a character class: 您可以使用正则表达式定义要删除的字符序列 ,而不是字符类:
@[0-9]+;
See the regex demo 见正则表达式演示
You can even tell the regex engine to only remove those at the beginning of the string only: 您甚至可以告诉正则表达式引擎仅删除仅在字符串开头的那些:
^@[0-9]+;
Sample demo : 样本演示 :
sub("^@[0-9]+;", "", '@34skirt') ## [1] "@34skirt"
sub("^@[0-9]+;", "", '@34;trousers') ## [1] "trousers"
We can try 我们可以试试
sub("@\\d+;", "", v1)
#[1] "mystring" "@34skirt" "trousers" "trousers"
v1 <- c('mystring', '@34skirt', '@56;trousers', '@897;trousers')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.