R Optimization of grep of data table

Question

I´ve got a series of empty largish data tables. They look like this (but much larger, ~6000 columns, between 1 and 100000 rows):

apple = c(NA, NA, NA)
orange = c(NA, NA, NA)
pear = c(NA, NA, NA)
demo <- data.table(apple, orange, pear)
row.names(demo) <- c("pineapples", "blood oranges", "grapes")

I am doing pattern matching to see if the row names contain the column names, and then mark the corresponding cells as TRUE/FALSE. I have a loop written that works well, but is extremely slow.

for(i in 1:ncol(demo)) {
    demo[, i] <- ifelse(grepl(colnames(demo)[i], 
                                      rownames(demo)), 
                                TRUE, 
                                FALSE)
}

Does anyone have any ideas how to do it in a faster way? It would be possible to use java, but I would prefer to solve it in pure R.

Answer 1

We can create a dataset with FALSE values, as the row.names are not allowed in 'data.table', create a vector of those names

rn <- c("pineapples", "blood oranges", "grapes")
for(j in seq_along(demo)){
    set(demo, i= grep(names(demo)[j], rn), j=j, value = TRUE)
   }

data

 demo <- as.data.table(matrix(FALSE,3,3, 
    dimnames=list(NULL, c('apple', 'orange', 'pear'))))

R Optimization of grep of data table

Question

1 answers

solution1
3 2016-02-11 13:07:22

data

R Optimization of grep of data table

Question

1 answers

solution1 3 2016-02-11 13:07:22

data

solution1
3 2016-02-11 13:07:22