简体   繁体   中英

How can I make a factor in a data frame numeric permanently?

Let me begin by saying that I am very new to programming and R, so this might be a stupid question. But here it goes.

I am working with a large data frame containing metadata from a corpus. One column contains the proficiency of a text (ie "B1", "B2", "C1", "C2"). I have been able to rename these factor levels into "1", "2", "3" and "4", but I need to make them numeric so that I can use this column as a dependent variable in linear modeling. I have tried some suggested methods, but these are not working and I don't know why.

I have tried the following code, but when I check the structure it is still a factor with four levels and is not numeric:

> as.numeric(as.character(df$proficiency))
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
 [42] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 ...
[452] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

> str(proficiency)
 Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...

#I have also tried this, but it does not work either. 
> df$proficiency<-as.numeric(as.character(df$proficiency))

> str(proficiency)
 Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...

Why is this happening? What am I doing wrong?

The problem here is that you are assigning a numeric to a data frame column named proficiency, but then with str() you are checking a variable named proficiency. As @joran says in the comments if you do str(df$proficiency) this conversion should be correct. The same conversion can also be achieved by

df$proficiency<-as.numeric(levels(df$proficiency))[df$proficiency]

which is slightly faster for large data frames

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM