簡體   English   中英

計算兩個兩位數年份之間的差異

[英]Calculating the difference between two two-digit years

在R中是否有任何簡單的方法來計算兩列兩位數年份之間的差異(僅幾年,沒有月份/天,因為這里沒有必要),以便生成一列年齡?

我對此很新,並且一直在使用'if'語句和代數而沒有成功。

數據看起來像這樣,但更大:

dat <- data.frame(year1=c("98","99","00","01","02"),
                  year2=c("03","04","05","06","07"))

您可以使用格式為%y strptime()

dat <- data.frame(year1=c("98","99","00","01","02"),
    year2=c("03","04","05","06","07"),
    stringsAsFactors = F) # You might want to use this as a default!

dat$year1 <- strptime(dat$year1, format = "%y")
dat$year2 <- strptime(dat$year2, format = "%y")

as.vector(difftime(dat$year2,
    dat$year1,
    units = "days"))/365.242
4.999311 5.002163 4.999425 4.999425 4.999425

格式化為日期,格式化為數字,取差:

do.call(`-`, lapply(dat[1:2], function(x) 
    as.numeric(format(as.Date(x, format="%y"), "%Y"))))
#[1] -5 -5 -5 -5 -5

如果您在1900年代早期有舊日期,這可能會遇到無效的情況。 按照?strptime

 ‘%y’ Year without century (00-99).  On input, values 00 to 68 are
      prefixed by 20 and 69 to 99 by 19 - that is the behaviour
      specified by the 2004 and 2008 POSIX standards, but they do
      also say ‘it is expected that in a future version the default
      century inferred from a 2-digit year will change’.
df$age <- ifelse(df$year2 < df$year1, df$year2 - df$year1 + 100, df$year2 -df$year1)

假設下應該工作year2某種當年和year1是出生年份,並沒有1918年以前出生的人。

例:

df <- data.frame(year1 = sample(18:99, 1000, replace = T), 
                 year2 = sample(1:99, 1000, replace = T))

> head(df)
  year1 year2
1    27    88
2    41    55
3    90    36
4    81    93
5    56    60
6    27    61

df$age <- ifelse(df$year2 < df$year1, df$year2 - df$year1 + 100, df$year2 -df$year1)

> head(df)
  year1 year2 age
1    73    88  15
2    50    17  67
3    47    41  94
4    54    43  89
5    36    82  46
6    62    85  23

使用您的數據示例:

dat <- data.frame(year1=c("98","99","00","01","02"),
                  year2=c("03","04","05","06","07"))

dat$age <- ifelse(as.numeric(as.character(dat$year2)) < as.numeric(as.character(dat$year1)), 
                  as.numeric(as.character(dat$year2)) - as.numeric(as.character(dat$year1)) + 100, 
                  as.numeric(as.character(dat$year2)) - as.numeric(as.character(dat$year1)))

> dat
  year1 year2 age
1    98    03   5
2    99    04   5
3    00    05   5
4    01    06   5
5    02    07   5

一個方法是使用as.Datedplyr鏈:

dat %>%
  mutate(year1 = as.Date(year1, format = "%y"), 
         year2 = as.Date(year2, format = "%y")) %>%
  mutate(age = year2 - year1)

返回:

       year1      year2       age
1 1998-10-26 2003-10-26 1826 days
2 1999-10-26 2004-10-26 1827 days
3 2000-10-26 2005-10-26 1826 days
4 2001-10-26 2006-10-26 1826 days
5 2002-10-26 2007-10-26 1826 days

ps它假定兩列的默認日期和月份,但它假設兩者都是相同的值,因此不會影響差異計算。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM