简体   繁体   English

R从字符串中删除所选字符

[英]R removing selected characters from a string

Sorry in case of duplication, but the solutions I have seen does not solve my issue. 抱歉,如果出现重复,但是我看到的解决方案不能解决我的问题。

I have a data frame (df). 我有一个数据框(df)。 One of its variables (df$Year) includes a list of years, such as: 其变量之一(df $ Year)包括年份列表,例如:

 > df$Year

 Year
 2001–                       
 2013–                     
 2016–                      
 2003–                      
 2012–2013                      
 2013–                      
 1993–2007, 2010–

In case of multiple years, I just want to keep the last one (ie rather than '1993–2007, 2010–' only '2010') and get rid of the '-'. 在多年的情况下,我只想保留最后一个(即不是“ 1993-2007、2010-”而是“ 2010”),而去掉“-”。 Yet, I have tried with: 但是,我尝试过:

unlist(str_extract_all(df$Year, "[[:digit:]]4$"))

but this does not seem to work. 但这似乎不起作用。

Any hint? 有什么提示吗?

We can use sub for a one liner: 我们可以将sub用作一个衬板:

df$Year <- sub(".*(\\d{4})\\–?", "\\1", df$Year)
df$Year

[1] "2001" "2013" "2016" "2003" "2013" "2013" "2010"

Demo 演示

Note that the dashes you use in your year ranges appear to be em dashes (or maybe en dashes), not the regular ASCII character. 请注意,您在年份范围内使用的破折号似乎是破折号(或可能是破折号),而不是常规的ASCII字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM