简体   繁体   中英

Converting numeric values to factor levels with factor levels assigned on the basis of the numeric ordering

Consider the data frame

a = c(0, 1, 3, 5, 6, 0, 1, 3, 6, 12)
b = c(letters[5:9], letters[2:6])
c = data.frame(var1 = a, var2 = b)

I want to convert all values in the data frame to consecutive integers factor levels starting from 1 and use these as numeric values to compute something (in reality I don't do this for the letters but I just added these to explain my problem ;) ).

With some help ( Converting numeric values of multiple columns to factor levels that are consecutive integers in (descending) order ), I did this through:

c[] = lapply(c, function(x) {levels(x) <- 1:length(unique(x)); x})

Unfortunately, this only replaces the values with their respective factor levels for the character column var2 but not the for the numeric column var1 (notice the 0 in column var1 )

> c
   var1 var2
1     0    4
2     1    5
3     3    6
4     5    7
...

To alleviate the problem I converted all columns to character when creating c

c = as.data.frame(sapply(data.frame(var1 = a, var2 = b), as.character))

This yields

   var1 var2
1     1    4
2     2    5
3     4    6
4     5    7
5     6    8
6     1    1
7     2    2
8     4    3
9     6    4
10    3    5

The problem here, however, is that the value 12 ( c[10,'var1'] ) in column var1 is considered as the 3rd value (it gets assigned factor level 3 after levels 1 and 2 for values 0 and 1 ) rather than the last value (factor level 6 because it is the largest numeric value in var1 ).

Is there a way to assign factor levels on the basis of the numeric ordering at the same time replacing the numeric values by their factor levels?

Based on the description, it seems like the OP wanted to change the levels to numeric values starting from 1. This can be done using match

c[] <- lapply(c, function(x) factor(match(x, sort(unique(x)))))
c
#    var1 var2
#1     1    4
#2     2    5
#3     3    6
#4     4    7
#5     5    8
#6     1    1
#7     2    2
#8     3    3
#9     5    4
#10    6    5

data

a <- c(0, 1, 3, 5, 6, 0, 1, 3, 6, 12)
b <- c(letters[5:9], letters[2:6])
c <- data.frame(var1 = a, var2 = b)

Based on the code in the comments, another option to replace str_pad is

c <- data.frame(var1 = sprintf("%02d", a), var2=b, stringsAsFactors=FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM