简体   繁体   中英

R Optimization of grep of data table

I´ve got a series of empty largish data tables. They look like this (but much larger, ~6000 columns, between 1 and 100000 rows):

apple = c(NA, NA, NA)
orange = c(NA, NA, NA)
pear = c(NA, NA, NA)
demo <- data.table(apple, orange, pear)
row.names(demo) <- c("pineapples", "blood oranges", "grapes")

I am doing pattern matching to see if the row names contain the column names, and then mark the corresponding cells as TRUE/FALSE. I have a loop written that works well, but is extremely slow.

for(i in 1:ncol(demo)) {
    demo[, i] <- ifelse(grepl(colnames(demo)[i], 
                                      rownames(demo)), 
                                TRUE, 
                                FALSE)
}

Does anyone have any ideas how to do it in a faster way? It would be possible to use java, but I would prefer to solve it in pure R.

We can create a dataset with FALSE values, as the row.names are not allowed in 'data.table', create a vector of those names

rn <- c("pineapples", "blood oranges", "grapes")
for(j in seq_along(demo)){
    set(demo, i= grep(names(demo)[j], rn), j=j, value = TRUE)
   }

data

 demo <- as.data.table(matrix(FALSE,3,3, 
    dimnames=list(NULL, c('apple', 'orange', 'pear'))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM