简体   繁体   中英

How to erase all non-letter characters before first letter (R vector of character strings)

I have a vector of character strings:

 cities <- c("London", "001 London", "Stockholm", "002 Stockholm")

I need to erase anything in each string that precedes first letter so that I would have:

 cities <- c("London", "London", "Stockholm", "Stockholm")

I've tried eg this

 cities <- sub("^.*?[a-zA-Z]", "", cities)

but that erases the first letter too, which I don't want to happen.

Use a negated character class to match all the non-alphabetic characters which exists at the start.

cities <- sub("^[^a-zA-Z]*", "", cities)

or

Use capturing group to capture the first letter character.

cities <- sub("^.*?([a-zA-Z])", "\\1", cities)

Use

cities <- c("London", "001 London", "Stockholm", "002 Stockholm")
gsub("^\\P{L}*", "", cities, perl=T)

See IDEONE demo

The ^\\\\P{L}* regex means:

  • ^ - Assert the beginning of the string
  • \\\\P{L}* - 0 or more characters other than a letter.

This solution is preferable if you have city names starting with Unicode letters.

Delete number:

 gsub('\\d+','',cities)
 [1] "London"     " London"    "Stockholm"  " Stockholm"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM