用另一数据框R的行替换一列中每次出现的因子变量

Question

Say I have two dataframes. 说我有两个数据框。 One is my 'main' df and the other is the one I'm using to replace values in the main df. 一个是我的“主” df，另一个是我用来替换主df中的值的那个。

So in column cd of dfMain , every time the factor level orange comes up I want to replace this with the corresponding row from dfReplace (which has a rowname called orange ) 因此，在dfMain cd列中，每次出现orange因子水平时，我都希望将其替换为dfReplace的相应行（其行dfReplace为orange ）

This will result in dfMain gaining 3 columns in width because the cd column goes away and it gains columns X1, X2, X3, X4 这将导致dfMain宽度增加3列，因为cd列消失并且它获得了X1, X2, X3, X4

The key here is that I need this to be as efficient as possible because my actual data is much, much longer 这里的关键是我需要尽可能提高效率，因为我的实际数据要长得多

Reproducible example: 可重现的示例：

set.seed(42)
dfMain <- data.frame('av' = sample.int(10, 100, replace = TRUE), 
                     'ba' = sample.int(10, 100, replace = TRUE), 
                     'cd' = sample(c('orange', 'apple', 'banana', 'strawberry', 'blueberry', 'blackberry'), 100, replace = TRUE))

dfReplace <- data.frame('X1' = runif(6),
                        'X2' = runif(6),
                        'X3' = runif(6),
                        'X4' = runif(6))
rownames(dfReplace) <- c('orange', 'apple', 'banana', 'strawberry', 'blueberry', 'blackberry')

Answer 1

I'd suggest first converting the rownames to an explicit table field and converting the cd factor to character, and then doing a database join, which should be very fast. 我建议先将行名转换为显式表字段，然后将cd因子转换为字符，然后再进行数据库联接，这应该非常快。

library(dplyr)
dfReplace2 <- dfReplace %>%
  add_rownames(var = "cd")

dfMain %>%
  mutate(cd = as.character(cd)) %>%
  left_join(dfReplace2)

I left the original 'cd' field there, but could be removed with %>% select(-cd) . 我把原始的“ cd”字段留在那里，但是可以用%>% select(-cd)删除。

用另一数据框R的行替换一列中每次出现的因子变量

问题描述

1 个解决方案

解决方案1
2 2018-09-10 19:53:28

用另一数据框R的行替换一列中每次出现的因子变量

问题描述

1 个解决方案

解决方案1 2 2018-09-10 19:53:28

解决方案1
2 2018-09-10 19:53:28