简体   繁体   English

从字符串中提取世纪和年份

[英]Extracting century and year from a string

I have a large column displaying a string such as:我有一个显示字符串的大列,例如:

20-1843PA-HY-4563-214DF 20-1843PA-HY-4563-214DF

The "20" is the century while the "18 is the year. What is the simplest way to extract these two using a function and have an output of 2018 in R? “20”是世纪,而“18”是年份。使用 function 并在 ZE1E1D3D40573127E9AFZEE0480C 中有 2018 年的 output 提取这两者的最简单方法是什么?

We can use sub to capture the digits as a group from the start ( ^ ) of the string followed by the - , then capture the two digits ( (\\d{2}) ) and replace with the backreference ( \\1\\2 ) of the captured group我们可以使用sub从字符串的开头 ( ^ ) 后跟-将数字捕获为一组,然后捕获两个数字 ( (\\d{2}) ) 并替换为反向引用 ( \\1\\2 ) 被捕获的组

f1 <- function(nm) as.numeric(sub("^(\\d+)-(\\d{2}).*", "\\1\\2", nm))
f1(str1)
#[1] 2018

data数据

str1 <- "20-1843PA-HY-4563-214DF"

I would do something like this:我会做这样的事情:

chr_collumn<-"20-1843PA-HY-4563-214DF"
chr_collumn<-strsplit(chr_collumn,"-")
chr_collumn<-unlist(chr_collumn)[1:2]
chr_year<-paste0(chr_collumn[1],strtrim(chr_collumn[2],width=2))
chr_year<-as.numeric(chr_year)
chr_year

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM