简体   繁体   中英

R : Efficient loop on row with data.table

I am using data.table in R and looping over my table, it s really slow because of my table size. I wonder if someone have any idea on

I have a set of value that I want to "cluster". Each line have a position, a positive integer. You can load a simple view of that :

    library(data.table)
    #Here is a toy example    
    fulltable=c(seq (1,4))*c(seq(1,1000,10))
    fulltable=data.table(pos=fulltable[order(fulltable)])
    fulltable$id=1

So I loop in my lines and When there is more than 50 between two position I change the group :

#here is the main loop
lastposition=fulltable[1]$pos
lastid=fulltable[1]$id
for(i in 2:nrow(fulltable)){
    if(fulltable[i]$pos-50>lastposition){
        lastid=lastid+1
        print(lastid)
    }
    fulltable[i]$id=lastid;
    lastposition=fulltable[i]$pos
}

Any idea for an effi

fulltable[which((c(fulltable$pos[-1], NA) - fulltable$pos) > 50) + 1, new_group := 2:(.N+1)]
fulltable[is.na(new_group), new_group := 1]
fulltable[, c("lastid_new", "new_group") := list(cummax(new_group), NULL)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM