[英]Is there a way to read in a large document as a data.frame in R?
I'm trying to use ggplot2 on a large data set stored into a csv
file.我正在尝试在存储到
csv
文件中的大型数据集上使用 ggplot2。 I used to read it with excel.我曾经用 excel 阅读它。
I don't know how to convert this data into a data.frame
.我不知道如何将此数据转换为
data.frame
。 In particular, I have a date column that has the following format: "2020/04/12:12:00" .特别是,我有一个日期列,其格式如下: "2020/04/12:12:00" 。 How can I get
R
to understand this format?我怎样才能让
R
理解这种格式?
If it's a csv
, you can use:如果是
csv
,您可以使用:
fread
function from data.table
.fread
中data.table
。 This will be the fastest way to read your csv.read_csv
or read_csv2
(for ;
delimited documents) in readr
packagereadr
中的read_csv
或read_csv2
(用于;
分隔的文档) If it's .xls
(or .xlsx
) document, have a look at the readxl
package.如果是
.xls
(或.xlsx
)文档,请查看readxl
package。
All these functions import your data as data.frame
s (with additional classes like data.table
for fread
or tibble
for read_csv
).所有这些函数都将您的数据导入为
data.frame
(附加类,如data.table
用于fread
或tibble
用于read_csv
)。
Given your comment, it looks like your file is not an excel but a csv.鉴于您的评论,您的文件似乎不是 excel 而是 csv。 If you want to convert a column type to date, assuming your dataframe is called
df
如果要将列类型转换为日期,假设您的 dataframe 称为
df
df[, dates := as.POSIXct(get(colnames(df)[1]), format = "%Y/%m/%d:%H:%M")]
Note that you don't need to use cbind
or even reassign the data.table
because you use :=
operator请注意,您不需要使用
cbind
甚至重新分配data.table
因为您使用:=
运算符
As the message is saying you, you don't need the extra-precision of POSIXlt
正如消息所说,您不需要
POSIXlt
的额外精度
Going by the question alone, I would suggest the openxlsx package, it has helped me reduce the time significantly in reading large datasets.仅就这个问题而言,我建议使用 openxlsx package,它帮助我显着减少了读取大型数据集的时间。 Three points you may find it to be helpful based on your question and the comments
根据您的问题和评论,您可能会发现这三点很有帮助
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.