简体   繁体   中英

replace exact string match with regexp in R

I have a vector of strings that need cleaning. I have been able to clean it quite a lot on my own but I am having problems one thing.

Some strings have the chain '@56;' at the beginning (numbers vary). So a string can be '@56;trousers' or '@897;trousers' I would like to leave it just like 'trousers'.

I have written the following code:

gsub("[@[:digit:];]", "", 'mystring')   

but it fails in cases like:

gsub("[@[:digit:];]", "", '@34skirt') # returns 'skirt'

I would like it to return '@34skirt' in this case because the ; is missing from the end.

I want a exact match. Any ideas about how to do this? I ahve tried to add \\ and it does not work

The [@[:digit:];] regex matches a single character that is either a @ , or a digit, or a ; . Thus, it will remove those at any position in the string, as many times as it finds them with gsub .

You may use a regex defining a sequence of characters to remove, not a character class:

@[0-9]+;

See the regex demo

You can even tell the regex engine to only remove those at the beginning of the string only:

^@[0-9]+;

Sample demo :

sub("^@[0-9]+;", "", '@34skirt')     ## [1] "@34skirt"
sub("^@[0-9]+;", "", '@34;trousers') ## [1] "trousers"

We can try

sub("@\\d+;", "", v1)
#[1] "mystring" "@34skirt" "trousers" "trousers"

data

v1 <- c('mystring', '@34skirt',  '@56;trousers', '@897;trousers') 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM