简体   繁体   中英

How to remove specific characters from string in a column in R?

I've got the following data.

df <- data.frame(Name = c("TOMTom Catch",
                          "BIBill Ronald",
                          "JEFJeffrey Wilson",
                          "GEOGeorge Sic",
                          "DADavid Irris"))

How do I clean the data in names column?

I've tried nchar and substring however some names need the first two characters removed where as other need the first three?

We can use regex lookaround patterns.

gsub("^[A-Z]+(?=[A-Z])", "", df$Name, perl = T)
#> [1] "Tom Catch"      "Bill Ronald"    "Jeffrey Wilson" "George Sic"    
#> [5] "David Irris"   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM