简体   繁体   中英

R removing selected characters from a string

Sorry in case of duplication, but the solutions I have seen does not solve my issue.

I have a data frame (df). One of its variables (df$Year) includes a list of years, such as:

 > df$Year

 Year
 2001–                       
 2013–                     
 2016–                      
 2003–                      
 2012–2013                      
 2013–                      
 1993–2007, 2010–

In case of multiple years, I just want to keep the last one (ie rather than '1993–2007, 2010–' only '2010') and get rid of the '-'. Yet, I have tried with:

unlist(str_extract_all(df$Year, "[[:digit:]]4$"))

but this does not seem to work.

Any hint?

We can use sub for a one liner:

df$Year <- sub(".*(\\d{4})\\–?", "\\1", df$Year)
df$Year

[1] "2001" "2013" "2016" "2003" "2013" "2013" "2010"

Demo

Note that the dashes you use in your year ranges appear to be em dashes (or maybe en dashes), not the regular ASCII character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM