简体   繁体   中英

R code optimization with a data.frame

I have a Large SpatialPointDataFrame with 10570 elements, in which each row is a point with an associated date(some rows have the same date). This object has 4760 columns (it's the output of the extract() function between a RasterStack and the points) and each column corresponds to a date with an associated value (temperature).

Simplified example:

DATE2       BICHO   X2000.01.01   X2000.01.02   (...)   X2012.12.31
2009-04-08  Woody      20.7          19.2        ...         9.5
2009-04-09  Woody      20.7          19.2        ...         9.5
2009-04-10  Woody      20.7          19.2        ...         9.5
2004-11-30  Woody      20.7          19.2        ...         9.5
2004-12-01  Buzz       20.7          19.2        ...         9.5
2004-12-02  Buzz       20.7          19.2        ...         9.5

What I want to do is to create a new column (TP) in this data.frame, that contains the temperature for each corresponding date.

for(i in 11:4760){
  datas<-str_sub(colnames(pts@data[i]), start=2,end=11L)
  datas<-format(as.Date(datas, "%Y.%m.%d"),"%Y-%m-%d")
  for(j in seq_along(pts@data$TP)){
    print(c(i,j))   #just a print to see how fast is the code

The code works but it's very slow, can anyone help me to optimize it?

There is no date in the columns that matches the DATE2 dates, but I hope this works for you:

df = data.table(df) 
dfm = data.table:::melt.data.table(df,
                               id.var = c("DATE2","BICHO"), 
                               variable.name = "date", 
                               value.name = "TP")
dfm[,date := gsub("X","",date)]
dfm[,date := ymd(date)]
dfm[,DATE2 := ymd(DATE2)]
dfm[DATE2 == date,]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM