Replace every occurrence of factor variable in one column with row from another dataframe R

Question

Say I have two dataframes. One is my 'main' df and the other is the one I'm using to replace values in the main df.

So in column cd of dfMain , every time the factor level orange comes up I want to replace this with the corresponding row from dfReplace (which has a rowname called orange )

This will result in dfMain gaining 3 columns in width because the cd column goes away and it gains columns X1, X2, X3, X4

The key here is that I need this to be as efficient as possible because my actual data is much, much longer

Reproducible example:

set.seed(42)
dfMain <- data.frame('av' = sample.int(10, 100, replace = TRUE), 
                     'ba' = sample.int(10, 100, replace = TRUE), 
                     'cd' = sample(c('orange', 'apple', 'banana', 'strawberry', 'blueberry', 'blackberry'), 100, replace = TRUE))

dfReplace <- data.frame('X1' = runif(6),
                        'X2' = runif(6),
                        'X3' = runif(6),
                        'X4' = runif(6))
rownames(dfReplace) <- c('orange', 'apple', 'banana', 'strawberry', 'blueberry', 'blackberry')

Answer 1

I'd suggest first converting the rownames to an explicit table field and converting the cd factor to character, and then doing a database join, which should be very fast.

library(dplyr)
dfReplace2 <- dfReplace %>%
  add_rownames(var = "cd")

dfMain %>%
  mutate(cd = as.character(cd)) %>%
  left_join(dfReplace2)

I left the original 'cd' field there, but could be removed with %>% select(-cd) .

Replace every occurrence of factor variable in one column with row from another dataframe R

Question

1 answers

solution1
2 2018-09-10 19:53:28

Replace every occurrence of factor variable in one column with row from another dataframe R

Question

1 answers

solution1 2 2018-09-10 19:53:28

solution1
2 2018-09-10 19:53:28