简体   繁体   中英

R gsub regex Pascal Case to Camel Case

I want to write a gsub function using R regexes to replace all capital letters in my string with underscore and the lower case variant. In a seperate gsub, I want to replace the first letter with the lowercase variant. The function should do something like this:

pascal_to_camel("PaymentDate") -> "payment_date"
pascal_to_camel("AccountsOnFile") -> "accounts_on_file"
pascal_to_camel("LastDateOfReturn") -> "last_date_of_return"

The problem is, I don't know how to tolower a "\\\\1" returned by the regex.

I have something like this:

name_format = function(x) gsub("([A-Z])", paste0("_", tolower("\\1")), gsub("^([A-Z])", tolower("\\1"), x))

But it is doing tolower on the string "\\\\1" instead of on the matched string.

You may use the following solution (converted from Python, see the Elegant Python function to convert CamelCase to snake_case? post):

> pascal_to_camel <- function(x) tolower(gsub("([a-z0-9])([A-Z])", "\\1_\\2", gsub("(.)([A-Z][a-z]+)", "\\1_\\2", x)))
> pascal_to_camel("PaymentDate")
[1] "payment_date"
> pascal_to_camel("AccountsOnFile")
[1] "accounts_on_file"
> pascal_to_camel("LastDateOfReturn")
[1] "last_date_of_return"

Explanation

  • gsub("(.)([AZ][az]+)", "\\\\1_\\\\2", x) is executed first to insert a _ between any char followed with an uppercase ASCII letter followed with 1+ ASCII lowercase letters (the output is marked as y in the bullet point below)
  • gsub("([a-z0-9])([AZ])", "\\\\1_\\\\2", y) - inserts _ between a lowercase ASCII letter or a digit and an uppercase ASCII letter (result is defined as z below)
  • tolower(z) - turns the whole result to lower case.

The same regex with Unicode support ( \\p{Lu} matches any uppercase Unicode letter and \\p{Ll} matches any Unicode lowercase letter):

pascal_to_camel_uni <- function(x) {
     tolower(gsub("([\\p{Ll}0-9])(\\p{Lu})", "\\1_\\2", 
         gsub("(.)(\\p{Lu}\\p{Ll}+)", "\\1_\\2", x, perl=TRUE), perl=TRUE))
}
pascal_to_camel_uni("ДеньОплаты")
## => [1] "день_оплаты"

See this online R demo .

Using two regex ([AZ]) and (?!^[AZ])([AZ]) , perl = TRUE , \\\\L\\\\1 and _\\\\L\\\\1 :

name_format <- function(x) gsub("([A-Z])", perl = TRUE, "\\L\\1", gsub("(?!^[A-Z])([A-Z])", perl = TRUE, "_\\L\\1", x))
> name_format("PaymentDate")
[1] "payment_date"
> name_format("AccountsOnFile")
[1] "accounts_on_file"
> name_format("LastDateOfReturn")
[1] "last_date_of_return"

Code demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM