简体   繁体   中英

How to convert factor ordered variables to numeric

I Have a dataset that contains 79 explanatory variables, In which 43 of them are factor.

Some of the factor variables are just generic labels - For those I intend to use dummy variables for numeric representation.

Some other subset of the factor variables contain ordered levels, for example:

BsmtQual: Evaluates the height of the basement

       Ex   Excellent (100+ inches) 
       Gd   Good (90-99 inches)
       TA   Typical (80-89 inches)
       Fa   Fair (70-79 inches)
       Po   Poor (<70 inches
       NA   No Basement

I want to convert such factor variable to a numeric value that will preserve the order of the levels from the lowest to the highest, meaning that after the operation I want to get something like:

BsmtQual: Evaluates the height of the basement

       Ex records will be replaced with: 6  
       Gd records will be replaced with: 5
       TA records will be replaced with: 4
       Fa records will be replaced with: 3
       Po records will be replaced with: 2
       NA records will be replaced with: 1

(Note sure If I can replace NA with 0 - As NA doesn't actually refers to missing data for this variable, but refers to a record with a low basement score)

How to code this replacement?

req_var$ExterQual <- revalue(req_var$ExterQual, c("Ex"=5  ,"Gd"=4 , "TA"=3 , "Fa"=2 ,"Po"=1)) 

Here I will not conside NA in these dataset. If you want to give number NA to 0 then add "NA"=0 in the above command.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM