简体   繁体   中英

Doing calculations on dataframe from ffdf object

Im working with a large dataset (3.5M lines and 40 columns) and I need to clean out some values so I´ll be able to calculate other parameters that I are necessary when I start formulating a model around the data.

The problem is that it is taking forever to apply the for loops that I have been using so I wanted to try to make use of the ff package. The dataframe is called data and it consists of bunch of customer information for a bank. It was imported as a .csv file. What I need to do is remove all customers (labeled Serial) if their AverageStanding variable is ever negative

> ffd<-as.ffdf(data)
> lastserial = tail(ffd$Serial,1)
> for(k in 1:lastserial){
+   tempvecWith <- vector()
+   tempvecWith <- ffd[ffd$Serial==k, ]$AverageStanding
+   if(any(tempvecWith < 0)){
+     ffd_clean<- ffd[!ffd$Serial ==k, ]
+   }
+ }

This is the error that I am receiving:

Error in as.hi.integer(x, maxindex = maxindex, dim = dim, vw = vw, pack = pack) : 
NAs in as.hi.integer

Any ideas on how I can avoid these errors?

The error comes from this part of your code ffd[ffd$Serial==k, ] . Namely ffd$Serial==k returns an ff logical vector. But if you want to index or subset an ff vector or ffdf, you need to supply the index numbers, not a vector of logicals. You can turn your ff vector of logicals into an ff vector of index numbers by using ffwhich from package ffbase.

So for your questions, I believe you are looking for this kind of code (not tested as you did not supply any data).

require(ffbase)
idx <- ffd$AverageStanding < 0
idx <- ffwhich(idx, idx==TRUE)
open(ffd)
serials.with.negative <- ffd$Serial[idx]
serials.with.negative <- unique(serials.with.negative)
ffd$is.customer.with.negative.avgstanding <- ffd$Serial %in% serials.with.negative

idx <- ffd$is.customer.with.negative.avgstanding == FALSE
idx <- ffwhich(idx, idx==TRUE)
open(ffd)
ffd_clean <- ffd[idx, ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM