用R中的regexp替换完全匹配的字符串

Question

I have a vector of strings that need cleaning. 我有一个需要清洗的字符串向量。 I have been able to clean it quite a lot on my own but I am having problems one thing. 我已经能够自己清理很多东西，但是我遇到一件事。

Some strings have the chain '@56;' 有些字符串的链为“ @ 56;”。 at the beginning (numbers vary). 开头（数字有所不同）。 So a string can be '@56;trousers' or '@897;trousers' I would like to leave it just like 'trousers'. 因此，字符串可以是“ @ 56;裤子”或“ @ 897;裤子”，我想像“裤子”一样保留它。

I have written the following code: 我写了以下代码：

gsub("[@[:digit:];]", "", 'mystring')

but it fails in cases like: 但在以下情况下失败：

gsub("[@[:digit:];]", "", '@34skirt') # returns 'skirt'

I would like it to return '@34skirt' in this case because the ; 我想在这种情况下返回'@ 34skirt'，因为 is missing from the end. 从最后开始消失了。

I want a exact match. 我要完全匹配。 Any ideas about how to do this? 有关如何执行此操作的任何想法？ I ahve tried to add \\ and it does not work 我试着添加\\，但是它不起作用

Answer 1

The [@[:digit:];] regex matches a single character that is either a @ , or a digit, or a ; [@[:digit:];]正则表达式匹配单个字符，该字符可以是@或数字，也可以是; . 。 Thus, it will remove those at any position in the string, as many times as it finds them with gsub . 因此，它将删除字符串中任意位置的那些字符，与使用gsub找到它们的次数相同。

You may use a regex defining a sequence of characters to remove, not a character class: 您可以使用正则表达式定义要删除的字符序列，而不是字符类：

@[0-9]+;

See the regex demo 见正则表达式演示

You can even tell the regex engine to only remove those at the beginning of the string only: 您甚至可以告诉正则表达式引擎仅删除仅在字符串开头的那些：

^@[0-9]+;

Sample demo : 样本演示：

sub("^@[0-9]+;", "", '@34skirt')     ## [1] "@34skirt"
sub("^@[0-9]+;", "", '@34;trousers') ## [1] "trousers"

Answer 2

We can try 我们可以试试

sub("@\\d+;", "", v1)
#[1] "mystring" "@34skirt" "trousers" "trousers"

data 数据

v1 <- c('mystring', '@34skirt',  '@56;trousers', '@897;trousers')

用R中的regexp替换完全匹配的字符串

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-02-22 09:55:50

解决方案2
2 2016-02-22 09:56:02

data 数据

用R中的regexp替换完全匹配的字符串

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-02-22 09:55:50

解决方案2 2 2016-02-22 09:56:02

data 数据

解决方案1
2 已采纳 2016-02-22 09:55:50

解决方案2
2 2016-02-22 09:56:02