简体   繁体   English

拆分ffdf对象

[英]Splitting an ffdf object

I'm using ff and ffbase libraries to manage a big csv file (~40Go and 275e6 observations). 我正在使用ffffbase库来管理一个大的csv文件(~40Go和275e6观察)。 I'd like to split/partition this file according to one of its columns (which is a factor column). 我想根据其中一个列(这是一个因子列)拆分/分区此文件。

With a normal data frame, I would do something like that: 使用正常的数据框,我会做类似的事情:

a <- data.frame(rnorm(10000,0,1),
                sample(1:100,10000,replace=T),
                sample(letters,10000,replace = T))
names(a) <- c('V1','V2','V3')
a_partition <- split(a,a$V3)
names(a_partition) <- paste("df",names(a_partition),sep = "_")
list2env(a_partition,globalenv())

but ff and ffbase doesn't have a split function. 但是ffffbase没有split功能。 So, looking in the ffbase documentation, I found ffdfply and tried to use it as follows: 因此,查看ffbase文档,我发现ffdfply并尝试使用它如下:

ffa <- as.ffdf(a)
ffa_partititon <- ffdfdply(x = ffa,split = ffa$V3)

Alas, I get the log message : 唉,我收到了日志消息:

calculating split sizes 计算分割大小
building up split locations 建立分裂地点
working on split 1/1, extracting data in RAM of 26 split elements, 在分裂1/1上工作,在RAM中提取26个分裂元素的数据,
totalling, 0.00015 GB, while max specified 总计,0.00015 GB,同时指定最大值
data specified using BATCHBYTES is 0.01999 GB 使用BATCHBYTES指定的数据为0.01999 GB
... applying FUN to selected data ...将FUN应用于所选数据
Error: argument "FUN" is missing, with no default 错误:缺少参数“FUN”,没有默认值

I tried FUN = as.data.frame (since the result of the function must be a data frame) with no luck : doing so makes ffa_partition a copy of ffa... 我尝试了FUN = as.data.frame (因为函数的结果必须是数据框)而没有运气:这样做使ffa_partition成为ffa的副本...

How can I partition my ffdf? 我如何分区我的ffdf?

Two years late, but I believe this does what you needed: 迟了两年,但我相信这可以满足您的需求:

result_list <- list()
for(letter in letters){
    result_list[[letter]] <- subset(ffa, V3 == letter)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM