简体   繁体   English

替换 R 数据框中因子列的内容

[英]Replace contents of factor column in R dataframe

I need to replace the levels of a factor column in a dataframe.我需要替换数据框中因子列的级别。 Using the iris dataset as an example, how would I replace any cells which contain virginica with setosa in the Species column?iris数据集为例,我如何在Species列中用setosa替换任何包含virginica的单元格?

I expected the following to work, but it generates a warning message and simply inserts NAs:我希望下面的工作正常,但它会生成一条警告消息并简单地插入 NA:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:我敢打赌,问题是当您尝试用一个新值替换值时,该值当前不属于现有因素的水平:

levels(iris$Species)
# [1] "setosa"     "versicolor" "virginica" 

Your example was bad, this works:你的例子很糟糕,这有效:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

This is what more likely creates the problem you were seeing with your own data:这更有可能造成您在自己的数据中看到的问题:

iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L,  :
#   invalid factor level, NAs generated

It will work if you first increase your factor levels:如果您首先提高因子水平,它将起作用:

levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'

If you want to replace "species A" with "species B" you'd be better off with如果您想用“物种 B”替换“物种 A”,您最好使用

levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"

For the things that you are suggesting you can just change the levels using the levels :对于您建议的事情,您可以使用levels更改级别:

levels(iris$Species)[3] <- 'new'

You can use the function revalue from the package plyr to replace values in a factor vector.您可以使用包plyr中的函数revalue来替换因子向量中的值。

In your example to replace the factor virginica by setosa :在您的示例中,将因子virginica替换为setosa

 data(iris)
 library(plyr)
 revalue(iris$Species, c("virginica" = "setosa")) -> iris$Species

I had the same problem.我有同样的问题。 This worked better:这效果更好:

Identify which level you want to modify: levels(iris$Species)确定要修改的级别: levels(iris$Species)

    "setosa" "versicolor" "virginica" 

So, setosa is the first.所以, setosa是第一个。

Then, write this:然后,这样写:

     levels(iris$Species)[1] <-"new name"

Using dlpyr::mutate and forcats::fct_recode :使用dlpyr::mutateforcats::fct_recode

library(dplyr)
library(forcats)

iris <- iris %>%  
  mutate(Species = fct_recode(Species,
    "Virginica" = "virginica",
    "Versicolor" = "versicolor"
  )) 

iris %>% 
  count(Species)

# A tibble: 3 x 2
     Species     n
      <fctr> <int>
1     setosa    50
2 Versicolor    50
3  Virginica    50   

A more general solution that works with all the data frame at once and where you don't have to add new factors levels is:一个更通用的解决方案可以同时处理所有数据框并且您不必添加新的因子水平是:

data.mtx <- as.matrix(data.df)
data.mtx[which(data.mtx == "old.value.to.replace")] <- "new.value"
data.df <- as.data.frame(data.mtx)

A nice feature of this code is that you can assign as many values as you have in your original data frame at once, not only one "new.value" , and the new values can be random values.这段代码的一个很好的特性是您可以一次分配与原始数据框中一样多的值,而不仅仅是一个"new.value" ,并且新值可以是随机值。 Thus you can create a complete new random data frame with the same size as the original.因此,您可以创建一个与原始数据框大小相同的全新随机数据框。

You want to replace the values in a dataset column, but you're getting an error like this:您想要替换数据集列中的值,但您收到如下错误:

invalid factor level, NA generated无效因子水平,NA 生成

Try this instead:试试这个:

levels(dataframe$column)[levels(dataframe$column)=='old_value'] <- 'new_value'

In case you have to replace multiple values and if you don't mind "refactoring" your variable with as.factor(as.character(...)) you could try the following:如果您必须替换多个值并且您不介意使用 as.factor(as.character(...)) “重构”您的变量,您可以尝试以下操作:

replace.values <- function(search, replace, x){
  stopifnot(length(search) == length(replace))
  xnew <- replace[ match(x, search) ]
  takeOld <- is.na(xnew) & !is.na(x)
  xnew[takeOld] <- x[takeOld]
  return(xnew)
}

iris$Species <- as.factor(search=c("oldValue1","oldValue2"),
                          replace=c("newValue1","newValue2"),
                          x=as.character(iris$Species))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM