简体   繁体   中英

the difference between `\\s|*` and `\\s|[*]` in regular expression in r?

What is the difference between \\\\s|* and \\\\s|[*] in regular expression in r?

> gsub('\\s|*','','Aug 2013*')
[1] "Aug2013*"
> gsub('\\s|[*]','','Aug 2013*')
[1] "Aug2013"

What is the function of [ ] here?

The first expression is invalid in the way you are using it, hence * is a special character. If you want to use sub or gsub this way with special characters, you can use fixed = TRUE parameter set.

This takes the string representing the pattern being search for as it is and ignores any special characters.

See Pattern Matching and Replacement in the R documentation.

x <- 'Aug 2013****'
gsub('*', '', x, fixed=TRUE)
#[1] "Aug 2013"

Your second expression is just using a character class [] for * to avoid escaping, the same as..

x <- 'Aug 2013*'
gsub('\\s|\\*', '', x)
#[1] "Aug2013"

As far as the explanation of your first expression: \\\\s|*

\s      whitespace (\n, \r, \t, \f, and " ")
|       OR

And the second expression: \\\\s|[*]

\s      whitespace (\n, \r, \t, \f, and " ")
|       OR
[*]     any character of: '*'

The use of [] here is nothing else but to escape the * to a literal asterisk.

The first regex is invalid ( * is special character meaning "zero or more").

The second regex is equivalent to

'\\s|\\*'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM