简体   繁体   中英

Convert multiple columns of numeric data to dates in R

My dataframe has 4 dates columns, but when I load the file from Excel to R, all the date columns become numeric. I did not want to write out a separate line to convert each column to date in R, as this is something I will have to do quite often in this dataset, so I wrote out this loop, to change the date type. It hasn't worked as I've wanted it to you, as the years have all become 2000's instead of the 1900's, which was the original date. I've put in my sample dataset and the code below:

xl <- structure(list(DOB = c(33483, 19213, 18947, 25266, 14581, 22870, 23705, 19592, 15033, 17856, 15551, 33681, 23483, 34619, 29125,31824, 18560, 35009, 16994, 22052, 17111, 28724, NA, 24852, 10980, 34222, 32220,18262, 16141, 28075, 11058, 23102, 26111, 30951, 14429, 25017,28281, 13239, 33977, 17309, 28103, 12115, 21331, 13217, 22898, 31491, 19787, 20160, 12364, 10609, 33846, 22699, 30428, 19421, 33339, 31575, 35187, 25053, 25500, 9291, 19100, 33025, 20040, 22909, 28189, 31909, 34476, 29007, 25575, 24127, 17493, 19572, 29032, 35241, 16353, 17038, 17623, 28056, 16408, 27879, 31161, 25669, 35614, 30573, 21878, 35815, 28826, 24351, 19828, 27159, 22897, 25779, 30880, 30344, 18643, 23748, 24340, 23784, 31276, 25795, 16908, 34277, 22550, 18824, 13795, 34548, 34940, 17395, 22603, 28913, 19478, 16117, 29331, 29557, 16459,32665, 35092, 33810, 13710, 34611, 26339, 33712, 35505, 17427, 29238, 30557, 21994, 23020, 20084, 23647, 21838, 9421, 33657, 14433, 22284, 33857, 31064, 35270, 33380, 21866, 15317, 35466, 29503, 33401, 27059, 19315, 23095, 28487, 35434, 15403, 21563, 22801, 27079, 24511, 18215, 16171, 16601, 29396, 24118, 21030, 24544, 12856, 35721, 11105, 23213, 35322, 15290, 20132, 23691, 30587, 27723, 30233, 28173, 30811, 33259, 12814, 36117, 14638, 34681, 13191, 23205, 14160, 20210, 35569, 31310, 16329, 26409, 20704, 32217, 28347, 21187, 15977, 31470, 28644, 15303, 31341, 18369, 16545, 24221, 19052, 34062, 28375, 33067, 17319, 32124, 15140, 24736, 23447, 12800, 27580, 18167, 34765, 31025, 21441, 16035, 21086, 21330,26485, 16274, 14136, 28513, 28381, 19584, 8446, 20227, 19866, 17269, 22108, 28557, 13340, 13953, 18622), 
D1 = c(40886, 40890, 40944, 40947, 40941, 
40948, 40948, 41199, 40967, 41053, 40974, 40981, 41114, 41094, 
41116, 41123, 41135, 41150, 41194, 41226, 41317, 41212, 41213, 
41297, 41267, 41267, 41295, 41506, 41310, 41310, 41316, 41318, 
41319, 41323, 41502, 41326, 41331, 41339, 41381, 41360, 41372, 
41373, 41382, 41407, 41444, 41450, 41457, 41458, 41459, 41486, 
41488, 41488, 41488, 41488, 41500, 41535, 41533, 41543, 41554, 
41561, 41565, 41582, 41592, 41606, 41624, 41624, 41682, 41682, 
41683, 41690, 41696, 41704, 41711, 41715, 41715, 41701, 41732, 
41739, 41760, 41774, 41792, 41795, 41813, 41815, 41816, 41816, 
41821, 41823, 41824, 41841, 41844, 41850, 41849, 41850, 41852, 
41856, 41858, 41862, 41873, 41873, 41877, 41878, 41879, 41880, 
41880, 41887, 41887, 41887, 41891, 41891, 41893, 41899, 41901, 
41905, 41906, 41907, 41907, 41911, 41887, 41921, 41925, 41928, 
41928, 41934, 41939, 41942, 41943, 41947, 41947, 41953, 41954, 
41955, 41968, 41977, 41978, 41981, 41984, 41991, 41992, 42020, 
42023, 42031, 42032, 42040, 42041, 42047, 42047, 42054, 42065, 
42059, 42061, 42069, 42073, 42075, 42079, 42102, 42123, 42131, 
42135, 42121, 42135, 42138, 42142, 42142, 42146, 42146, 42160, 
42165, 42173, 42174, 42174, 42187, 42195, 42202, 42201, 42142, 
42152, 42255, 42264, 42284, 42291, 42298, 42298, 42298, 42312, 
42174, 41505, 41519, 41638, 41723, 41848, 41862, 41862, 41885, 
41925, 41953, 42107, 42207, 40987, 41331, 41505, 41723, 41892, 
41926, 41960, 41985, 42144, 42188, 40961, 41058, 41108, 41200, 
41254, 41309, 41291, 41331, 41366, 41389, 41401, 41444, 41493, 
41610, 41694, 41718, 41806, 41873, 41956, 42019, 42037, 42164, 
42200, 41562), D2 = c(40695, 31205, 34135, 
40391, 39995, 40725, 40483, 41183, 40817, 39814, 33239, 40909, 
40725, 41030, 40756, 40969, 39326, 39814, 41061, 41061, 40909, 
40483, 36161, 37622, 40544, 40909, 40817, 39448, 40179, 32509, 
40238, 40575, 41030, 38353, 40969, 40787, 41061, 41030, 41214, 
40695, 41000, 41183, 39083, 39934, 40603, 39904, 40940, 41426, 
41214, 40725, 41426, 40695, 39814, 40179, 41183, 41275, 41218, 
41214, 40940, 41426, 40544, 40909, 38047, 41579, 34700, 35746, 
41000, 36161, 41426, 41183, NA, 38718, 41548, 41456, 38536, 
39387, 41548, 41518, 40360, 41699, 41778, 41655, 41030, 41730, 
40909, 40544, 41671, 41214, 41699, 39083, 41214, 41640, 41671, 
36161, 41426, 41821, 39083, 41275, 41000, 41760, 41579, 36526, 
41548, 37987, 40179, 40179, 40787, 41609, 41730, 40544, 38504, 
41334, 41334, 41609, 41275, 41699, 40817, 41214, 41334, 41518, 
35065, 35796, 41170, 41699, 41695, 41365, 41852, 37257, 41579, 
33604, 40909, 41913, 41852, 41564, 41852, 41883, 39448, 39083, 
40544, 41944, 41275, 41852, 41640, 42005, 41548, 39995, 30682, 
41883, 41546, 41640, 41791, 41334, 41944, 40179, 41995, 40179, 
23012, 39814, 41956, 39083, 41609, 39448, 41974, 41275, 40544, 
42125, 41928, 39814, 41944, 41962, 40909, 42095, 41852, 41913, 
41944, 40848, 42096, 40544, 40179, 41913, 40179, 42064, 41395, 
37622, 42156, 31048, 41314, 41377, 41452, 41623, 41813, 41760, 
41705, 41867, 41699, 41942, 41944, 42197, 29221, 40179, 41000, 
41153, 40544, 39448, 41548, 41760, 40179, 41821, 40909, 38353, 
39448, 41000, 40940, 41000, 40909, 41000, 40664, 41091, 41030, 
41395, 41306, 41061, 41518, 41334, 41609, 41852, 41760, 41821, 
41944, 42095, 42095, 41476), D3 = c(40817, 
40817, 40913, 40940, 40940, 40756, 40634, 41183, 40940, 41030, 
40817, 40969, 41091, 41091, 40787, 41122, 39448, 41030, 41183, 
41091, 41091, 40848, 36526, 41365, 41153, 40909, 41030, 41244, 
41122, 35065, 40544, 41122, 41061, 40179, 41183, 41306, 41214, 
41306, 41365, 41334, 41030, 41244, 41091, 41395, 41275, 41426, 
40940, 41456, 41365, 41456, 41456, 41426, 41456, 41395, 41487, 
41275, 41414, 41518, 41275, 41456, 41579, 41153, 41579, 41640, 
41334, 41395, 41487, 41579, 41426, 41671, 41671, 41699, 41699, 
41456, 41456, 39508, 41730, 41609, 41760, 41760, 41791, 41671, 
41365, 41791, 41821, 41699, 41821, 41548, 41821, 41821, 41579, 
41699, 41821, 41821, 41821, 41852, 41640, 41852, 41852, 41791, 
41852, 41852, 41852, 41760, 41852, 41640, 41518, 41852, 41883, 
41852, 41487, 41699, 41883, 41640, 41883, 41730, 41883, 41791, 
41883, 41671, 41699, 41671, 41671, 41883, 41863, 41913, 41852, 
41699, 41944, 41730, 41760, 41944, 41883, 41760, NA, 41974, 
41974, 41974, 41609, 41944, 42005, 41913, 41913, 42005, 42036, 
41913, 42036, 42036, 42064, 41944, 41944, 42064, 42675, 42064, 
42064, 42095, 42095, 42064, 41956, 42121, 41974, 42125, 42005, 
42125, 40544, 42125, 41974, 42156, 41944, 42005, 42005, 42095, 
41852, 42186, 42186, 42036, 42095, 42125, 42186, 42064, 42309, 
42217, 42278, 42278, 42309, 41609, 41487, 41365, 41609, 41699, 
42186, 41852, 41852, 41974, 41913, 41944, 42095, 42186, NA, 
41183, 41183, 41518, 41579, 41791, 41579, 41883, 42186, 42186, 
40940, 40787, 40725, 41030, 40940, 41153, 41153, 41306, 40817, 
41214, 41395, 41426, 41306, 41609, 41671, 41579, 41791, 41852, 
41852, 41821, 41974, 42156, 42156, 42522)), .Names = c("D", "D1", "D2", "D3"), class = "data.frame", row.names = c(NA, 232L))

date_cols <- c(1,2,3,4)
for(j in date_cols)
{class(xl[,j] = "Date"}

You may need a function that tells R that these integers represent dates, and then you need to apply that function to each column of your dataframe:

myfun <- function(x) as.Date(x, format="%Y-%m-%d", origin="1899-12-30")
xlnew <- data.frame(lapply(xl, myfun))

You can avoid all this by passing function anonymously as well or like one of the answers. Also, options(stringsAsFactors = FALSE) is also set in my environment for unwanted changes to factors.

Logic:

Excel date starts from 1900-01-01 with the index of 1, However R date usually we consider origin at 1970-01-01. There is a difference of 70 years, plus 1 day difference of indexing as R first date starts with the index of 0 not 1.Also, there is a bug in excel due to its historical reason where Excel considers 29-Feb-1900 a valid date, which is not true. Hence we should subtract 2 days ( 1 day difference due to indexing other 1 day due to the bug in excel) from the actual origin (which is 1900-01-01 of excel) to get the correct date.

Output of first 5 rows:

> xlnew
             D         D1         D2         D3
1   1991-09-02 2011-12-09 2011-06-01 2011-10-01
2   1952-08-07 2011-12-13 1985-06-07 2011-10-01
3   1951-11-15 2012-02-05 1993-06-15 2012-01-05
4   1969-03-04 2012-02-08 2010-08-01 2012-02-01
5   1939-12-02 2012-02-02 2009-07-01 2012-02-01

@PKumar shows how to use the as.Date function, but creates a new data frame.

To replace a subset of columns within the original data frame you can do something like:

xl[date_cols] <- lapply(xl[date_cols], as.Date, origin="1899-12-30")

My personal preference is to use the data.table package - it is fast, the syntax is parsimonious and loops are easily implemented to modify by reference. It will be very efficient for large data sets. I would do it like this:

Option 1 - use the column names in the lapply function.

library(data.table)

setDT(xl) # Convert the data.frame to data.table, by reference

xl[ , c("D", "D1", "D2", "D3") := lapply(.SD, as.Date, origin="1899-12-30"), .SDcols = c("D", "D1", "D2", "D3")]

Option 2 - define the column names in a vector and use it in the lapply function

library(data.table)

setDT(xl) # Convert the data.frame to data.table, by reference

my.cols <- c("D", "D1", "D2", "D3")

xl[ , (my.cols) := lapply(.SD, as.Date, origin="1899-12-30"), .SDcols = my.cols]

Note that both options will change the existing data in place, so you don't need to assign it to a new object.

I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM