简体   繁体   English

用R中的“否”替换空白单元格

[英]Replace blank cell with “no” in R

I like to replace blank cells (" ") in a column with "no". 我喜欢将列中的空白单元格(“”)替换为“ no”。 The missing entries do have a meaning for me (no score determined yet) and I like to use the factor variable in a regression tree later. 缺少的条目确实对我有意义(尚未确定分数),我以后喜欢在回归树中使用factor变量。

I found a similar question here ( Replace blank cells with character ) and tried the following, but then the blank cells are converted to NA and not as text: 我在这里找到了类似的问题( 用character替换空白单元格 ),并尝试了以下操作,但是随后将空白单元格转换为NA而不是文本:

> Test$SCORE[Test$SCORE==" "]<- "no"

Warning message:
In `[<-.factor`(`*tmp*`, Test$SCORE == " ", value = c(NA, NA, 8L,  :
  invalid factor level, NA generated

Is there a way to avoid NA and use my own text? 有没有办法避免使用NA并使用我自己的文字?

Please see example data "Test": 请参见示例数据“测试”:

ID  Score  
 1. A
 2. " "
 3. B
 4. " "
 5. C

Is there a way to avoid NA and use my own text? 有没有办法避免使用NA并使用我自己的文字? This is the result I like to achieve: 这是我想要实现的结果:

ID  Score
1   A
2   "no"
3   B 
4   "no"
5   C

The dataset is very large therefore a manual solution via indexing specific rows is quite time consuming. 数据集非常大,因此通过索引特定行的手动解决方案非常耗时。 I appreciate your help because R is quite new for me. 感谢您的帮助,因为R对我来说还很新。

Thank you very much in advance. 提前非常感谢您。

Additional info: 附加信息:

str(Test$SCORE) Factor w/ 13 levels " ","A","B","C",.. str(Test $ SCORE)带有13个级别的因数“”,“ A”,“ B”,“ C”,..

Please excuse the format of the example table, but this is my first question. 请原谅示例表的格式,但这是我的第一个问题。

Work on the factor levels: 在因子水平上工作:

DF <- read.table(text = 'ID  Score  
                 1. A
                 2. " "
                 3. B
                 4. " "
                 5. C', header = TRUE)
levels(DF$Score)[levels(DF$Score) == " "] <- "no"
#  ID Score
#1  1     A
#2  2    no
#3  3     B
#4  4    no
#5  5     C

This is very efficient since there are usually far less factor levels than elements in your vector. 这是非常有效的,因为因子水平通常比向量中的元素少得多。

> df <- data.frame(Test=1:5,Score=c("A"," ","B"," "," "))
> df
   Test Score
 1    1     A
 2    2      
 3    3     B
 4    4      
 5    5      

> df[,2] <- as.character(df$Score)
> is.character(df[,2])
[1] TRUE

> df$Score[df$Score==" "] <- "No"
> df
  Test Score
1    1     A
2    2    No
3    3     B
4    4    No
5    5    No

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM