Reading large csv file with missing data using bigmemory package in R

Question

I am using large datasets for my research (4.72GB) and I discovered "bigmemory" package in R that supposedly handles large datasets (up to the range of 10GB). However, when I use read.big.matrix to read a csv file, I get the following error:

> x <- read.big.matrix("x.csv", type = "integer", header=TRUE, backingfile="file.bin", descriptorfile="file.desc")

Error in read.big.matrix("x.csv", type = "integer", header = TRUE,  
: Dimension mismatch between header row and first data row.

I think the issue is that the csv file is not full, ie, it is missing values in several cells. I tried removing header = TRUE but then R aborts and restarts the session.

Does anyone have experience with reading large csv files with missing data using read.big.matrix?

Answer 1

It may not be solving your problem directly, but you might find a package of mine filematrix useful. The relevant function is fm.create.from.text.file .

Please let me know if it works for your data file.

Answer 2

Did you check bigmemory PDF at https://cran.r-project.org/web/packages/bigmemory/bigmemory.pdf ?

It was clearly described right there.

write.big.matrix(x, 'IrisData.txt', col.names=TRUE, row.names=TRUE)
y <- read.big.matrix("IrisData.txt", header=TRUE, has.row.names=TRUE)

# The following would fail with a dimension mismatch:
if (FALSE) y <- read.big.matrix("IrisData.txt", header=TRUE)

Basically, error means there is a column in the CSV file with row names. If you don't pass has.row.names=TRUE , bigmemory will consider row names a separate column, and without header you'll get mismatch.

I personally found data.table package more useful for dealing with large data set cases, YMMV

Reading large csv file with missing data using bigmemory package in R

Question

2 answers

solution1
1 2015-11-19 19:47:22

solution2
0 2015-11-19 19:58:37

Reading large csv file with missing data using bigmemory package in R

Question

2 answers

solution1 1 2015-11-19 19:47:22

solution2 0 2015-11-19 19:58:37

solution1
1 2015-11-19 19:47:22

solution2
0 2015-11-19 19:58:37