[英]openxlsx function read.xlsx fails to correctly read dates in R
I'm trying to load a .xlsx
file in R, using the openxlsx
package.我正在尝试使用openxlsx
包在 R 中加载.xlsx
文件。 Unfortunately, the spreadsheet has some strange formatting in the date column, which is in the format "Month/Day/Year", eg 9/21/2014
.不幸的是,电子表格在日期列中有一些奇怪的格式,格式为“月/日/年”,例如9/21/2014
。 (Excel recognises this as a date format). (Excel 将此识别为日期格式)。 When importing with read.xlsx
, the month and day are missed off, leaving only the year as a numeric column.使用read.xlsx
导入时,月份和日期会被忽略,只留下年份作为数字列。 I suspect it is something to do with the /
character.我怀疑这与/
字符有关。
df <- read.xlsx("The File.xlsx", sheet = "Sheet 1")
head(df)
Number Type Other.Type Date
1 902 611 2014
2 902 611 2014
3 902 611 2014
4 795 966 2014
...
I've tried including the detectDates = TRUE
argument, but that just gives NAs
.我试过包括detectDates = TRUE
参数,但这只是给NAs
。
I can't edit the spreadsheet as the data belongs to someone else and I have just been given access to it.我无法编辑电子表格,因为数据属于其他人,而且我刚刚获得了访问权限。 Is there an equivalent of the colClasses
argument from the xlsx
package, or any other way of getting the data into R?是否有等效于来自xlsx
包的colClasses
参数,或任何其他将数据导入 R 的方法?
Many thanks非常感谢
This answer is just for completeness, in case anyone else ends up here with a similar problem.这个答案只是为了完整性,以防其他人在这里遇到类似的问题。 All thanks due to @StéphaneLaurent, who provided the suggestion in the comments.感谢@StéphaneLaurent,他在评论中提供了建议。
Switching to the readxl
package resolved the problems.切换到readxl
包解决了问题。 Note be sure to check the the help file for the read_xlsx
call;注意一定要查看read_xlsx
调用的帮助文件; particularly for the col_types
argument.特别是对于col_types
参数。 This package will attempt to set each vector's data type on import, and if there are any inconsistencies it will produce warnings such as这个包将尝试在导入时设置每个向量的数据类型,如果有任何不一致,它将产生警告,例如
In read_fun(path = path, sheet_i = sheet, limits = limits, ... :
Expecting numeric in F1107 / R1107C6: got '?'
These are not a serious issue, so don't be put off using the package.这些都不是一个严重的问题,所以不要推迟使用该软件包。 Thanks, Stéphane!谢谢,斯蒂芬!
EDIT ~ 1 week later编辑 ~ 1 周后
After using readxl
in another script with different data, I have switched back to using openxlsx
as my basic go-to package.在具有不同数据的另一个脚本中使用readxl
后,我已切换回使用openxlsx
作为我的基本openxlsx
包。 Although readxl
worked well as a workaround for my original issue, the number of warnings()
it throws up is really irritating.尽管readxl
很好地解决我的原始问题,但它抛出的warnings()
数量确实令人恼火。 In this second case, it was becoming unusable;在第二种情况下,它变得无法使用; each time I ran a line of code (whether readxl
was involved or not), it would trigger warnings about eg Unknown or uninitialised column
.每次我运行一行代码时(无论是否涉及readxl
),它都会触发有关例如Unknown or uninitialised column
警告。 This was only resolved by closing down the R session and starting again.这只能通过关闭 R 会话并重新启动来解决。 I'm sure it is only because I'm making a small mistake with readxl
, but unless I'm faced with the same situation as above with mis-formatted dates, I will stick with openxlsx
, which I have generally found straightforward to use.我确定这只是因为我在使用readxl
犯了一个小错误,但是除非我遇到与上面相同的日期格式错误的情况,否则我会坚持使用openxlsx
,我发现它通常openxlsx
使用.
This worked for me after changing the format in excel to: date "2012-03-14"将excel中的格式更改为:日期“2012-03-14”后,这对我有用
library(openxlsx)
read.xlsx(xlsxFile = "The File.xlsx", sheet = "sheet 1" , detectDates = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.