简体   繁体   中英

R bigmemory won't read large csv file

I'm trying to load in a 689.4 MB csv using read.big.matrix from the R biganalytics package in a similar way to the Vignette for bigmemory .

Vignette Code:

library(bigmemory)
library(biganalytics)
x <- read.big.matrix("airline.csv", type="integer", header=TRUE,
        backingfile="airline.bin",
        descriptorfile="airline.desc",
        extraCols="Age")

Per the comment from 42-, I removed the factor variables using cut on the command line cut -d, -f9,11,17,18,23 --complement 2008.csv > 2008cut.csv

I then removed any of the NA values found in the data using sed sed -i 's/NA/0/g' 2008cut.csv Even with those pre-processing steps, I receive the same error.

My code:

#This works
x <- read.csv("~/Downloads/2008cut.csv",header=T)
dim(y)
#[1] 7009728      29
length(complete.cases(x))
#[1] 7009728

library(bigmemory)
library(biganalytics)
#This errors out
data <- read.big.matrix("~/Downloads/2008cut.csv", 
            type="integer", header=TRUE)

I receive the following error when trying to run read.big.matrix:

Warning: stack imbalance in '.Call', 31 then 32
Warning: stack imbalance in '{', 28 then 29
Warning: stack imbalance in '-', 23 then 24
Warning: stack imbalance in '-', 22 then 23
Warning: stack imbalance in '<-', 20 then 21
Error in big.matrix(nrow = numRows, ncol = createCols, type = type,
 dimnames = list(rowNames,  : 
    A big.matrix must have at least one row and one column

I have found others having this problem but they had mixed data or a similar problem but no response. At some point in my search, someone on a mailing list asked if the user could run something like x <- big.matrix(nrow=1000,ncol=10) to make sure bigmemory was working in general. I am able to run that code and generate a big.matrix.

Any guidance would be greatly appreciated!

Software Details:

  • Data: 2008 File
  • R: 3.2.3
  • OS: x86_64-pc-linux-gnu
  • bigmemory: 4.5.19
  • biganalytics: 1.1.14

对于读取大文件,我建议使用 R data.table包中的fread

Use absolute path:

absolutePath <- normalizePath("~/Downloads/2008cut.csv")

x <- read.big.matrix(absolutePath, type="integer", header=TRUE,
        backingfile="airline.bin",
        descriptorfile="airline.desc",
        extraCols="Age")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM