简体   繁体   中英

Mapping Letters to Numbers in R

I have a vector of strings consisting of n letters, for example "ABCDEF"

I need to map this to some unique number. Of course, the intuitive approach is to extract all single letters letter and then match them one by one to the corresponding number via

match(letter,LETTERS)

But that leads to too large numbers for large n , because I need 2 digits for every single one of the letters (from 01 to 26 ).

My idea is now to match each combination of strings to a unique number between 1 and 26^n , making use of the fact that 26^n has less than 2n digits for large n .

For example for n=4 we get "AAAA" -> 1 and "ZZZZ" -> 26^4

How can I do this in R?

I guess you want to code the letters like below

f <- function(letter) sum((match(unlist(strsplit(letter,"")),LETTERS)-1)*26**((nchar(letter)-1):0))+1

such that

> f("AAAA")
[1] 1

> f("AABC")
[1] 29

> f("ZZZZ")
[1] 456976

While this may be clever, using a factor may be much simpler and far easier to understand. You also get to keep the string format close to hand, while getting the space saving of it being encoded as an integer.

If you need integers in a database (which will do joins better on them) then you can cast the factor to an int with as.integer(factor_column) and you'll have the integer variants too.

What you'll loose is the determinism of the mapping, which may be important for you in the DB world if this is anything more than a one-off data load.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM