简体   繁体   中英

Transform variables using dplyr in r

I have the titanic dataset, and I want to make the variable suitable for SVM analysis.

> str(train)
'data.frame':   891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : chr  "male" "female" "female" "female" ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : chr  "" "C85" "" "C123" ...
 $ Embarked   : chr  "S" "C" "S" "S" ...

I want to remove some of the variables, and also change chr variables as Sex and Embarked to factors.

This is what I have so far.

train <- train %>%
  dplyr::select(-1,-4,-9,-11) %>%
  mutate(Sex=recode(Sex, "male"=1, "female"=0)) %>%
  mutate(Embarked=recode(Embarked, "C"=1, "S"=0)) %>%
  na.omit() 

Do you mean this kind of answer? getting a factor and recode?

library(titanic)
# titanic_train dataset
View(titanic_train)

train <- titanic_train %>%
  mutate_if(is.character, as.factor) %>% # all char to factor
  dplyr::select(-1,-4,-9,-11) %>% #removing columns
  mutate(Sex=recode(Sex, "male"="1", "female"="0"))%>% # recode factor
  mutate(Embarked=recode(Embarked, "C"="1", "S"="0")) %>% # recode factor, cave here are 4 levels
  na.omit() 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM