I've had a look around and can't quite seem to get a grasp of is going on with this. I'm using R in Eclipse. The file I'm trying to import is 700mb with around 15mil rows and 6 columns. As I was having problems loading in I have started using the ff
package.
library(ff)
FDF = read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\GBPUSD.1986.2014.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
names(FDF)= c('Date','Time','Open','High','Low','Close')
#names the columns in the ffdf file
dim(FDF)
# produces dimensions of the file
I then want to create a POSIXct sequence which will later be joined against the imported file. I had tried;
tm1 = seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = data.frame (DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
However R kept of crashing. I then tested this is RStudio and saw that their where constraints on the vector. It did, however, produce the correct
dim(tm1)
names(tm1)
So I went back into Eclipse thinking this was something to do with memory allocation. I've attempted the following;
library(ff)
tm1 = as.ffdf(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ffdf(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
names(tm1) = c('DateTime')
dim(tm1)
names(tm1)
This gives an error of
no applicable method for 'as.ffdf' applied to an object of class "c('POSIXct', 'POSIXt')"
I can't seem to work around this. I then tried ...
library(ff)
tm1 = as.ff(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ff(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
Which produce the output dates, however not in the correct format. In addition to this, when ...
dim(tm1)
names(tm1)
where executed they both returned null.
Question
We'll we got there in the end.
I believe the problem was the available RAM during the creation of the full vector. As this was the case I broke the vector into 3, converted them into ffdf format to free up RAM and then used rbind
to bind them together.
The problem with formatting the vector once created, I believe, was due to accessing RAM. Every time I tried this R crashed.
Even with the work around below my machine is slowing (4gb). I've ordered some more RAM and hope this will smooth future operations.
Below is the working code;
library(ff)
library(ffbase)
tm1 = seq(from = as.POSIXct('1986-12-01 00:00'), to = as.POSIXct('2000-12-01 23:59'), by = 'min')
tm1 = data.frame(DateTime=strftime(tm1, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm1 = as.ffdf(tm1)
# converts to ffdf format
memory.size()
tm2 = seq(from = as.POSIXct('2000-12-02 00:00'), to = as.POSIXct('2010-12-01 23:59'), by = 'min')
tm2 = data.frame(DateTime=strftime(tm2, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm2 = as.ffdf(tm2)
memory.size()
tm3 = seq(from = as.POSIXct('2010-12-2 00:00'), to = as.POSIXct('2014-09-04 23:59'), by = 'min')
tm3 = data.frame(DateTime=strftime(tm3, format='%Y.%m.%d %H:%M'))
memory.size()
tm3 = as.ffdf(tm3)
# converts to ffdf format
memory.size()
tm4 = rbind(tm1, tm2, tm3)
# binds ffdf objects into one
dim(tm4)
# checks the row numbers
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.