簡體   English   中英

為R中的數字變量中的缺失值創建虛擬

[英]Create dummy for missing values in numeric variable in r

我有以下數據:

PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked
1           1        0      3   male  22     1     0  7.2500        S
2           2        1      1 female  38     1     0 71.2833        C
3           3        1      3 female  26     0     0  7.9250        S
4           4        1      1 female  35     1     0 53.1000        S
5           5        0      3   male  35     0     0  8.0500        S
6           6        0      3   male  NA     0     0  8.4583        Q

現在,當我使用dummydummy.data.frame ,我可以成功將因子(此處為SexEmbarked )轉換為虛擬變量,如下所示:

PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare Embarked EmbarkedC EmbarkedQ EmbarkedS
1           1        0      3         0       1  22     1     0  7.2500        0         0         0         1
2           2        1      1         1       0  38     1     0 71.2833        0         1         0         0
3           3        1      3         1       0  26     0     0  7.9250        0         0         0         1
4           4        1      1         1       0  35     1     0 53.1000        0         0         0         1
5           5        0      3         0       1  35     0     0  8.0500        0         0         0         1
6           6        0      3         0       1  NA     0     0  8.4583        0         0         1         0

現在,如果我如何將其應用於“ Age列,該列會創建100多個虛擬變量,每個唯一的年齡條目一個,而NA一個。 我希望輸出像

Age   Age.NA
22    0 
38    0
......
35    0
0     1

它會自動將缺失值視為不同的條目,並在有因素的情況下為其創建變量,但是在數字變量的情況下,我希望實現相同的效果而不會妨礙該列中已有的值。 請幫忙。

您可以使用:

df$Age.NA <- ifelse(is.na(df$Age), 1, 0)

接着:

library(dummies)
dummy.data.frame(df)

輸出:

  PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare EmbarkedC EmbarkedQ EmbarkedS Age.NA
1           1        0      3         0       1  22     1     0  7.2500         0         0         1      0
2           2        1      1         1       0  38     1     0 71.2833         1         0         0      0
3           3        1      3         1       0  26     0     0  7.9250         0         0         1      0
4           4        1      1         1       0  35     1     0 53.1000         0         0         1      0
5           5        0      3         0       1  35     0     0  8.0500         0         0         1      0
6           6        0      3         0       1  NA     0     0  8.4583         0         1         0      1

數據:

df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
    Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
    1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
    71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
    1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
    Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")

使用ifelse()語句檢查NA

Age.NA <- ifelse(is.na(Age), 1, 0)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM