简体   繁体   English

有没有办法在 R 中将大型文档作为 data.frame 读取?

[英]Is there a way to read in a large document as a data.frame in R?

I'm trying to use ggplot2 on a large data set stored into a csv file.我正在尝试在存储到csv文件中的大型数据集上使用 ggplot2。 I used to read it with excel.我曾经用 excel 阅读它。

I don't know how to convert this data into a data.frame .我不知道如何将此数据转换为data.frame In particular, I have a date column that has the following format: "2020/04/12:12:00" .特别是,我有一个日期列,其格式如下: "2020/04/12:12:00" How can I get R to understand this format?我怎样才能让R理解这种格式?

If it's a csv , you can use:如果是csv ,您可以使用:

  • fread function from data.table .freaddata.table This will be the fastest way to read your csv.这将是读取 csv 的最快方法。
  • read_csv or read_csv2 (for ; delimited documents) in readr package阅读器readr中的read_csvread_csv2 (用于;分隔的文档)

If it's .xls (or .xlsx ) document, have a look at the readxl package.如果是.xls (或.xlsx )文档,请查看readxl package。

All these functions import your data as data.frame s (with additional classes like data.table for fread or tibble for read_csv ).所有这些函数都将您的数据导入为data.frame (附加类,如data.table用于freadtibble用于read_csv )。

Edit编辑

Given your comment, it looks like your file is not an excel but a csv.鉴于您的评论,您的文件似乎不是 excel 而是 csv。 If you want to convert a column type to date, assuming your dataframe is called df如果要将列类型转换为日期,假设您的 dataframe 称为df

df[, dates := as.POSIXct(get(colnames(df)[1]), format = "%Y/%m/%d:%H:%M")]

Note that you don't need to use cbind or even reassign the data.table because you use := operator请注意,您不需要使用cbind甚至重新分配data.table因为您使用:=运算符

As the message is saying you, you don't need the extra-precision of POSIXlt正如消息所说,您不需要POSIXlt的额外精度

Going by the question alone, I would suggest the openxlsx package, it has helped me reduce the time significantly in reading large datasets.仅就这个问题而言,我建议使用 openxlsx package,它帮助我显着减少了读取大型数据集的时间。 Three points you may find it to be helpful based on your question and the comments根据您的问题和评论,您可能会发现这三点很有帮助

  • The read command stays same as xlsx package, however would suggest you to use openxlsx::read.xslx(file_path)读取命令与 xlsx package 相同,但建议您使用openxlsx::read.xslx(file_path)
  • the arguments are again same, but in the place of sheetIndex it is sheet and it takes only numbers arguments 再次相同,但在sheetIndex的地方它是sheet并且只需要数字
  • If the existing columns are converted to character, then a simple as.Date would work如果现有的列被转换为字符,那么一个简单的as.Date就可以了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM