Let me begin by saying that I am very new to programming and R, so this might be a stupid question. But here it goes.
I am working with a large data frame containing metadata from a corpus. One column contains the proficiency of a text (ie "B1", "B2", "C1", "C2"). I have been able to rename these factor levels into "1", "2", "3" and "4", but I need to make them numeric so that I can use this column as a dependent variable in linear modeling. I have tried some suggested methods, but these are not working and I don't know why.
I have tried the following code, but when I check the structure it is still a factor with four levels and is not numeric:
> as.numeric(as.character(df$proficiency))
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
[42] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
...
[452] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
> str(proficiency)
Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
#I have also tried this, but it does not work either.
> df$proficiency<-as.numeric(as.character(df$proficiency))
> str(proficiency)
Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
Why is this happening? What am I doing wrong?
The problem here is that you are assigning a numeric to a data frame column named proficiency, but then with str()
you are checking a variable named proficiency. As @joran says in the comments if you do str(df$proficiency)
this conversion should be correct. The same conversion can also be achieved by
df$proficiency<-as.numeric(levels(df$proficiency))[df$proficiency]
which is slightly faster for large data frames
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.