简体   繁体   中英

Converting factor / ?nominal variables into numeric in R

My question seems to be related to this thread .

However, the method given there does not work for me.

I define a vector from a dataset as: eduyears1994 <- year1994$q131ed and receive a vector that looks like:

[1] 17 lat/9   1O lat/3,4 1O lat/3,4 17 lat/9   17 lat/9   12 lat/5,6
                                        1O lat/3,4 1O lat/3,4 12 lat/5,6
   9 Levels: Brak formal wykszta³cenia 4 lata/1 8 lat/2 1O lat/3,4 12 lat/5,6 
     14 lat/7,8 ... BRAK DANYCH

where eg "10 lat" stands for 10 years (of education) and "/3,4" most likely stands for the factor label.

I would simply like to have a numeric variable where I have eg "10" instead of "10 years" in the column.

I have tried the following and received the following error message:

eduyears1994n <- as.numeric(as.character(eduyears1994))
Warning message:
NAs introduced by coercion

I also tried to do it manually:

eduyears1994[eduyears1994== "4 lata/1"] <- 4
eduyears1994[eduyears1994== "2"] <- 8
eduyears1994[eduyears1994== "17 lat"] <- 17

but the error message reads:

In [<-.factor( tmp , eduyears1994 == "9", value = 17) :
invalid factor level, NA generated

When I open the file with SPSS I see numbers, not labels, but then the data format was specified as nominal somehow, which might be the cause for the problem.

dput(eduyears1994)
c("17 lat/9", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "12 lat/5,6", 
"12 lat/5,6", "17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "17 lat/9", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6", 
"8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "14 lat/7,8", 
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "8 lat/2", "17 lat/9", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"4 lata/1", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", 
"17 lat/9", "17 lat/9", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "8 lat/2", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"4 lata/1", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "8 lat/2", "14 lat/7,8", "12 lat/5,6", 
"8 lat/2", "8 lat/2", "14 lat/7,8", "8 lat/2", "14 lat/7,8", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"4 lata/1", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", 
"17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"12 lat/5,6", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "8 lat/2", "14 lat/7,8", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "12 lat/5,6", "14 lat/7,8", "17 lat/9", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"14 lat/7,8", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "17 lat/9", "8 lat/2", "14 lat/7,8", "1O lat/3,4", 
"8 lat/2", "17 lat/9", "17 lat/9", "17 lat/9", "12 lat/5,6", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "17 lat/9", "14 lat/7,8", "17 lat/9", "1O lat/3,4", 
"17 lat/9", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "17 lat/9", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "8 lat/2", 
"8 lat/2", "14 lat/7,8", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"14 lat/7,8", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", "12 lat/5,6", 
"8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", "8 lat/2", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"8 lat/2", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"12 lat/5,6", "8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", 
"1O lat/3,4", "17 lat/9", "1O lat/3,4", "17 lat/9", "12 lat/5,6", 
"14 lat/7,8", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "8 lat/2", "4 lata/1", "12 lat/5,6", "17 lat/9", 
"12 lat/5,6", "17 lat/9", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "8 lat/2", 
"8 lat/2", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "14 lat/7,8", 
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"12 lat/5,6", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"14 lat/7,8", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", 
"12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4", 
"17 lat/9", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", "1O lat/3,4"
)

Using your actual data, it appears that you have a character vector of the general format

n lat/a,b

where n is the years, and "a,b" is some kind of label. This will extract the years.

vec <- c("17 lat/9","10 lat/3,4","10 lat/3,4","17 lat/9","17 lat/9","12 lat/5,6","10 lat/3,4","10 lat/3,4","12 lat/5,6")
x <- strsplit(vec,split=" lat/",fixed=TRUE)
sapply(x,function(x)as.integer(x[1]))
# [1] 17 10 10 17 17 12 10 10 12

You could try

c(17,8,4)[as.numeric(eduyears1994)]
#[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  8  4  4  8  4  8  8

or

 unname(c('4 lata/1'=4, '2'=8, '17 lat' =17)[as.character(eduyears1994)])
 #[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  8  4  4  8  4  8  8

If 8 was infact a typo , you could use

 library(stringi)
 as.numeric(unlist(stri_extract_all_regex(eduyears1994, '^\\d+')))
 #[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  2  4  4  2  4  2  2

data

set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))

Using @akrun's example:

set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))

Using gsub and an (apparently) appropriately regular expression ( * denotes "0 or more of the preceding character or pattern", so eg "lata*" matches "lat" or "lata")

as.numeric(gsub(" lata*[/0-9,]*","",eduyears1994))

warning : this converts "2" into 2, not 8, which is not what you asked for. I'm not quite sure by what logic you convert "4 lata/1" to 4, "17 lat" to 17, and "2" to 8 -- perhaps you could explain? Maybe that was a typo?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM