[英]R bigmemory won't read large csv file
I'm trying to load in a 689.4 MB csv using read.big.matrix
from the R biganalytics
package in a similar way to the Vignette for bigmemory .我想在689.4 MB CSV使用加载
read.big.matrix
从R biganalytics
包以类似的方式对暗角的bigmemory 。
Vignette Code:小插图代码:
library(bigmemory)
library(biganalytics)
x <- read.big.matrix("airline.csv", type="integer", header=TRUE,
backingfile="airline.bin",
descriptorfile="airline.desc",
extraCols="Age")
Per the comment from 42-, I removed the factor variables using cut on the command line cut -d, -f9,11,17,18,23 --complement 2008.csv > 2008cut.csv
根据 42- 的评论,我在命令行
cut -d, -f9,11,17,18,23 --complement 2008.csv > 2008cut.csv
上使用 cut 删除了因子变量
I then removed any of the NA values found in the data using sed sed -i 's/NA/0/g' 2008cut.csv
Even with those pre-processing steps, I receive the same error.然后,我使用 sed
sed -i 's/NA/0/g' 2008cut.csv
删除了数据中发现的任何 NA 值,即使使用这些预处理步骤,我sed -i 's/NA/0/g' 2008cut.csv
收到相同的错误。
My code:我的代码:
#This works
x <- read.csv("~/Downloads/2008cut.csv",header=T)
dim(y)
#[1] 7009728 29
length(complete.cases(x))
#[1] 7009728
library(bigmemory)
library(biganalytics)
#This errors out
data <- read.big.matrix("~/Downloads/2008cut.csv",
type="integer", header=TRUE)
I receive the following error when trying to run read.big.matrix:尝试运行 read.big.matrix 时收到以下错误:
Warning: stack imbalance in '.Call', 31 then 32
Warning: stack imbalance in '{', 28 then 29
Warning: stack imbalance in '-', 23 then 24
Warning: stack imbalance in '-', 22 then 23
Warning: stack imbalance in '<-', 20 then 21
Error in big.matrix(nrow = numRows, ncol = createCols, type = type,
dimnames = list(rowNames, :
A big.matrix must have at least one row and one column
I have found others having this problem but they had mixed data or a similar problem but no response.我发现其他人有这个问题,但他们有混合数据或类似的问题,但没有回应。 At some point in my search, someone on a mailing list asked if the user could run something like
x <- big.matrix(nrow=1000,ncol=10)
to make sure bigmemory was working in general.在我搜索的某个时候,邮件列表上的某个人询问用户是否可以运行类似
x <- big.matrix(nrow=1000,ncol=10)
来确保 bigmemory 正常工作。 I am able to run that code and generate a big.matrix.我能够运行该代码,并生成一个big.matrix。
Any guidance would be greatly appreciated!任何指导将不胜感激!
Software Details:软件详情:
对于读取大文件,我建议使用 R data.table
包中的fread
。
Use absolute path:使用绝对路径:
absolutePath <- normalizePath("~/Downloads/2008cut.csv")
x <- read.big.matrix(absolutePath, type="integer", header=TRUE,
backingfile="airline.bin",
descriptorfile="airline.desc",
extraCols="Age")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.