简体   繁体   English

从未格式化的日期字符向量中提取年份

[英]Extracting year from unformatted date character vector

I have a character vector, which represents the year of coverage in an unformatted date, and it like this: 我有一个字符向量,它以无格式的日期表示覆盖的年份,如下所示:

     Period of coverage
1    1/1/2011 to 31/12/2011
2    1/1/2010 to 31/12/2010
3    1/1/2012 to 31/12/2012
4    1/1/2010 to 31/12/2010
5    1/1/2011 to 31/12/2011
6    1/1/2012 to 31/12/2012
7    1/1/2010 to 31/12/2010
8    1/1/2010 to 31/12/2010
9    1/1/2009 to 31/12/2009

I was wondering how I could possibly convert the columns to just the year each observation represents. 我想知道如何将这些列转换成每个观测值代表的年份。 Every row has the same start day and end day (1/1 and 31/12). 每行都有相同的开始日期和结束日期(1/1和31/12)。

假设您的数据存储在可变period并且所有日期的格式都保持不变,如您所说,

yr = substr(period, 19, 22)

Assuming DF shown reproducibly in the Note at the end remove everything up to the last slash and convert to numeric: 假设DF在末尾的注释中可重复显示,请删除所有内容,直到最后一个斜杠并转换为数字:

transform(DF, year = as.numeric(sub(".*/", "", `Period of coverage`)), check.names = FALSE)

giving: 赠送:

      Period of coverage year
1 1/1/2011 to 31/12/2011 2011
2 1/1/2010 to 31/12/2010 2010
3 1/1/2012 to 31/12/2012 2012
4 1/1/2010 to 31/12/2010 2010
5 1/1/2011 to 31/12/2011 2011
6 1/1/2012 to 31/12/2012 2012
7 1/1/2010 to 31/12/2010 2010
8 1/1/2010 to 31/12/2010 2010
9 1/1/2009 to 31/12/2009 2009

Another possibility is to convert it to Date class first noting that as.Date ignores junk at the end: 另一种可能性是首先将其转换为Date类,注意as.Date忽略垃圾:

to_year <- function(x, fmt) as.numeric(format(as.Date(x, fmt), "%Y"))
transform(DF, year = to_year(`Period of coverage`, "%d/%m/%Y"), check.names = FALSE)

Note 注意

Lines <- "     Period of coverage
1/1/2011 to 31/12/2011
1/1/2010 to 31/12/2010
1/1/2012 to 31/12/2012
1/1/2010 to 31/12/2010
1/1/2011 to 31/12/2011
1/1/2012 to 31/12/2012
1/1/2010 to 31/12/2010
1/1/2010 to 31/12/2010
1/1/2009 to 31/12/2009"
DF <- read.csv(text = Lines, check.names = FALSE, as.is = TRUE)

If your string has always the same format you can simply use substring and convert it to date: 如果您的字符串始终具有相同的格式,则只需使用子字符串并将其转换为日期即可:

    as.Date(substr("1/1/2011 to 31/12/2011",5,8), format="%Y") 
as.Date(substr("1/1/2011 to 31/12/2011",19,23), format="%Y")

If the string is more variable but is always split by the "to" you can unlist the string with stringsplit and then format it to the year: 如果字符串的可变性更大,但始终用“ to”分隔,则可以使用stringsplit取消列出字符串,然后将其格式化为年份:

a <- "1/1/2011 to 31/12/2011"
a2 <- strsplit(a, "to") ;
a3 <- unlist(a2) ;
a4 <- as.Date(a3, format="%d/%m/%Y")
year = format(a4, format="%Y")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM