简体   繁体   中英

conditionally remove leading or trailing `.` character in R

I have a vector of names where some names have leading and trailing . characters, and some do not. Here is an example:

test <- c('.name.1.','name.2','.name.3.')

I would like to conditionally remove leading and trailing . characters in these names, to return

c('name.1','name.2','name.3')

Use regular expressions:

test <- c('.name.1.','name.2','.name.3.')
gsub('^\\.|\\.$', '', test)
# [1] "name.1" "name.2" "name.3"

The two backslashes, \\ , in the regular expression escape the dot, . , which would actually mean any character. The caret, ^ , marks the beginning of the string, the dollar, $ , the end of the string. The pipe, | , is a logical "or". So in essence the regular expression matches a dot at the beginning of the string or a dot at the end of the string and replaces it with an empty string.

More information on regular expressions can be found here and information on gsub and related functions here .

A quick function using the substr function:

fun1 <- function(x) substr(x, 1 + (1 * as.numeric(substr(x,1,1)=='.')), nchar(x) - (1 * as.numeric(substr(x, nchar(x), nchar(x)) == '.')))

We use substr to check for a . in the first and last elements of the string, then we use substr again to extract certain parts of the text. For example, if there is a . in the first character, but not in the second, we'll extract: substr(text, 2, nchar(text)) .

fun1(test)
[1] "name.1" "name.2" "name.3"

You can also use str_extract from stringr :

library(stringr)

str_extract(test, "\\w+\\.\\d")

or str_replace_all ( stringr -equivalent to gsub ):

str_replace_all(test, "[.](.+)[.]", "\\1")

# [1] "name.1" "name.2" "name.3"

Just for fun, here is a method with substring and grepl .

substring(test, 1L + grepl("^\\.", test), nchar(test) - grepl("\\.$", test))
[1] "name.1" "name.2" "name.3"

This will work replacing substring with substr . The cool thing about these functions is that they take vectors for their second and third arguments. Here, we can use grepl to increment between 1L and 2L for the second argument and between the position of the final character and the penultimate character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM