简体   繁体   中英

Merging and appending a list of ffdf dataframes

I would like to read a vector of CSV files names as ffdf data frames and combine them into one big ffdf data frame. I have found solutions using other r packages; however, my issue is my data (combined) can reach 40GB which definitely needs to be stored on disk, as ff package does, and not in the RAM. There are awesome solutions here using RAM storage, as far as I know.

library(ffbase)
library(ff)

# Create list of csv files
csv_files <- list.files(path = input_path,
                        pattern="*.csv",
                        full.names = T)

# my approach so far
# this use fread, and it appears to be consuming RAM 

# Read the files in, assuming comma separator
csv_files_df <- lapply(csv_files, function(x) {
y<-unlist(str_split(x, "[.]"))[1]
    assign(y,
   as.ffdf(fread(x,stringsAsFactors = T)))})

# Combine them
combined_df <- do.call("ffdfappend", lapply(csv_files_df, as.ffdf))

When I try to combine them, it fires this error.

> combined_df <- do.call("ffdfappend", lapply(csv_files_df, as.ffdf))
Error in ffdfappend(list(virtual = list(VirtualVmode = c("double", "integer",  : 
  'list' object cannot be coerced to type 'logical'

Summary: I would like to read and merge the CSV files using only ff package without the need for another package to avoid OOM (Out Of Memory) status.

The ffdfappend() function only takes two data arguments - x and y . When you provide a list, it is assuming some of the data frames are the other arguments to ffdfappend() . To use this function in the way you intend, you probably need to write it in a loop, something like this:

csv_files <- list.files(path = input_path,
                        pattern="*.csv",
                        full.names = T)

# my approach so far
# this use fread, and it appears to be consuming RAM 

read <- function(x) {
  y<-unlist(str_split(x, "[.]"))[1]
  assign(y,
         as.ffdf(fread(x,stringsAsFactors = T)))}

# Read the files in, assuming comma separator
out <- read(csv_files[1])

for(i in 2:length(csv_files)){
  out <- ffdfappend(out, read(csv_files(i)))
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM