簡體   English   中英

在R中的因子類型數據中替換NA

[英]Replace NA in Factor type data in R

數據框X

數據框X看起來像這樣

State      code
New Jersey  1
New York    2
Califronia  NA

所有列都是因素。 我希望將NA替換為text或0。以便以后可以轉置它們。

當我嘗試運行此命令時

X[is.na(X)] <- "0"

我收到以下錯誤

 Warning messages: 1: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 2: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 3: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 4: In `[<-.factor`(`*tmp*`, thisvar, value = "0") : invalid factor level, NA generated 

NA值沒有變化。

使用內置factor另一種選擇:

df <- data.frame(a=letters[1:3], b=c("d", "e", NA))
df
  a    b
1 a    d
2 b    e
3 c <NA>

現在,用factor重新編碼factor

df$b <- factor(df$b, exclude = NULL, 
               levels = c("d", "e", NA), 
               labels = c("d", "e", "f"))
df
  a b
1 a d
2 b e
3 c f

對於許多因素,以下內容可能會有用:

df[] <- lapply(df, function(x){
  # check if you have a factor first:
  if(!is.factor(x)) return(x)
  # otherwise include NAs into factor levels and change factor levels:
  x <- factor(x, exclude=NULL)
  levels(x)[is.na(levels(x))] <- "0"
  return(x)
  })

只是:

X$code <- as.character(X$code) #as.numeric works just as good
X[is.na(X)] <- "0"
X$code <- as.factor(as.numeric(X$code))

在所有列的循環中,它看起來像這樣:

for (i in 2:ncol(X)) {
  X[,i] <- as.character(X[,i])
  X[which(is.na(X[,i])==TRUE),i] <- "0"
  X[,i] <- as.factor(as.numeric(X[,i]))
}

對於這樣的字符值:

for (i in 2:ncol(X)) {
  X[,i] <- as.character(X[,i])
  X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
  X[,i] <- as.factor(X[,i])
}

或者,如果您不想先轉換為字符,則為每列分配一個新級別:

for (i in 2:ncol(X)) {
  levels(X[,i]) <- c(levels(X[,i]), "Not Assigned")
  X[which(is.na(X[,i])==TRUE),i] <- "Not Assigned"
}

如果您不介意來回轉換,則您編寫的代碼將適用於矩陣。

> X
       State code code2
1  NewJersey    1    NA
2    NewYork    2     0
3 Califronia   NA     4

> X<-as.matrix(X)
> X[is.na(X)] <- "0"
> X<-as.data.frame(X)
> X
       State code code2
1  NewJersey    1     0
2    NewYork    2     0
3 Califronia    0     4

> str(X)
'data.frame':   3 obs. of  3 variables:
 $ State: Factor w/ 3 levels "Califronia","NewJersey",..: 2 3 1
 $ code : Factor w/ 3 levels " 1"," 2","0": 1 2 3
 $ code2: Factor w/ 3 levels " 0"," 4","0": 3 1 2

讓我們創建一個具有因子水平的隨機df

df <- data.frame(a=sample(0:10, size=10, replace=TRUE),
                 b=sample(20:30, size=10, replace=TRUE))
df[df$a==0,'a'] <- NA
df$a <- as.factor(df$a)

其他方法是:

#check levels
levels(df$a)
#[1] "3"  "4"  "7"  "9"  "10"

#add new factor level. i.e 88 in our example
df$a = factor(df$a, levels=c(levels(df$a), 88))

#convert all NA's to 88
df$a[is.na(df$a)] = 88

#check levels again
levels(df$a)
#[1] "3"  "4"  "7"  "9"  "10" "88"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM