简体   繁体   English

openxlsx 函数 read.xlsx 无法正确读取 R 中的日期

[英]openxlsx function read.xlsx fails to correctly read dates in R

I'm trying to load a .xlsx file in R, using the openxlsx package.我正在尝试使用openxlsx包在 R 中加载.xlsx文件。 Unfortunately, the spreadsheet has some strange formatting in the date column, which is in the format "Month/Day/Year", eg 9/21/2014 .不幸的是,电子表格在日期列中有一些奇怪的格式,格式为“月/日/年”,例如9/21/2014 (Excel recognises this as a date format). (Excel 将此识别为日期格式)。 When importing with read.xlsx , the month and day are missed off, leaving only the year as a numeric column.使用read.xlsx导入时,月份和日期会被忽略,只留下年份作为数字列。 I suspect it is something to do with the / character.我怀疑这与/字符有关。

df <- read.xlsx("The File.xlsx", sheet = "Sheet 1")

head(df)
  Number       Type         Other.Type     Date
    1           902             611        2014     
    2           902             611        2014   
    3           902             611        2014    
    4           795             966        2014 
 ...

I've tried including the detectDates = TRUE argument, but that just gives NAs .我试过包括detectDates = TRUE参数,但这只是给NAs

I can't edit the spreadsheet as the data belongs to someone else and I have just been given access to it.我无法编辑电子表格,因为数据属于其他人,而且我刚刚获得了访问权限。 Is there an equivalent of the colClasses argument from the xlsx package, or any other way of getting the data into R?是否有等效于来自xlsx包的colClasses参数,或任何其他将数据导入 R 的方法?

Many thanks非常感谢

This answer is just for completeness, in case anyone else ends up here with a similar problem.这个答案只是为了完整性,以防其他人在这里遇到类似的问题。 All thanks due to @StéphaneLaurent, who provided the suggestion in the comments.感谢@StéphaneLaurent,他在评论中提供了建议。

Switching to the readxl package resolved the problems.切换到readxl包解决了问题。 Note be sure to check the the help file for the read_xlsx call;注意一定要查看read_xlsx调用的帮助文件; particularly for the col_types argument.特别是对于col_types参数。 This package will attempt to set each vector's data type on import, and if there are any inconsistencies it will produce warnings such as这个包将尝试在导入时设置每个向量的数据类型,如果有任何不一致,它将产生警告,例如

In read_fun(path = path, sheet_i = sheet, limits = limits,  ... :
Expecting numeric in F1107 / R1107C6: got '?'

These are not a serious issue, so don't be put off using the package.这些都不是一个严重的问题,所以不要推迟使用该软件包。 Thanks, Stéphane!谢谢,斯蒂芬!


EDIT ~ 1 week later编辑 ~ 1 周后

After using readxl in another script with different data, I have switched back to using openxlsx as my basic go-to package.在具有不同数据的另一个脚本中使用readxl后,我已切换回使用openxlsx作为我的基本openxlsx包。 Although readxl worked well as a workaround for my original issue, the number of warnings() it throws up is really irritating.尽管readxl很好地解决我的原始问题,但它抛出的warnings()数量确实令人恼火。 In this second case, it was becoming unusable;在第二种情况下,它变得无法使用; each time I ran a line of code (whether readxl was involved or not), it would trigger warnings about eg Unknown or uninitialised column .每次我运行一行代码时(无论是否涉及readxl ),它都会触发有关例如Unknown or uninitialised column警告。 This was only resolved by closing down the R session and starting again.这只能通过关闭 R 会话并重新启动来解决。 I'm sure it is only because I'm making a small mistake with readxl , but unless I'm faced with the same situation as above with mis-formatted dates, I will stick with openxlsx , which I have generally found straightforward to use.我确定这只是因为我在使用readxl犯了一个小错误,但是除非我遇到与上面相同的日期格式错误的情况,否则我会坚持使用openxlsx ,我发现它通常openxlsx使用.

This worked for me after changing the format in excel to: date "2012-03-14"将excel中的格式更改为:日期“2012-03-14”后,这对我有用

library(openxlsx)
read.xlsx(xlsxFile = "The File.xlsx", sheet = "sheet 1" , detectDates = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM