R gsub regex Pascal Case to Camel Case

Question

I want to write a gsub function using R regexes to replace all capital letters in my string with underscore and the lower case variant. In a seperate gsub, I want to replace the first letter with the lowercase variant. The function should do something like this:

pascal_to_camel("PaymentDate") -> "payment_date"
pascal_to_camel("AccountsOnFile") -> "accounts_on_file"
pascal_to_camel("LastDateOfReturn") -> "last_date_of_return"

The problem is, I don't know how to tolower a "\\\\1" returned by the regex.

I have something like this:

name_format = function(x) gsub("([A-Z])", paste0("_", tolower("\\1")), gsub("^([A-Z])", tolower("\\1"), x))

But it is doing tolower on the string "\\\\1" instead of on the matched string.

Answer 1

You may use the following solution (converted from Python, see the Elegant Python function to convert CamelCase to snake_case? post):

> pascal_to_camel <- function(x) tolower(gsub("([a-z0-9])([A-Z])", "\\1_\\2", gsub("(.)([A-Z][a-z]+)", "\\1_\\2", x)))
> pascal_to_camel("PaymentDate")
[1] "payment_date"
> pascal_to_camel("AccountsOnFile")
[1] "accounts_on_file"
> pascal_to_camel("LastDateOfReturn")
[1] "last_date_of_return"

Explanation

gsub("(.)([AZ][az]+)", "\\\\1_\\\\2", x) is executed first to insert a _ between any char followed with an uppercase ASCII letter followed with 1+ ASCII lowercase letters (the output is marked as y in the bullet point below)
gsub("([a-z0-9])([AZ])", "\\\\1_\\\\2", y) - inserts _ between a lowercase ASCII letter or a digit and an uppercase ASCII letter (result is defined as z below)
tolower(z) - turns the whole result to lower case.

The same regex with Unicode support ( \\p{Lu} matches any uppercase Unicode letter and \\p{Ll} matches any Unicode lowercase letter):

pascal_to_camel_uni <- function(x) {
     tolower(gsub("([\\p{Ll}0-9])(\\p{Lu})", "\\1_\\2", 
         gsub("(.)(\\p{Lu}\\p{Ll}+)", "\\1_\\2", x, perl=TRUE), perl=TRUE))
}
pascal_to_camel_uni("ДеньОплаты")
## => [1] "день_оплаты"

See this online R demo .

Answer 2

Using two regex ([AZ]) and (?!^[AZ])([AZ]) , perl = TRUE , \\\\L\\\\1 and _\\\\L\\\\1 :

name_format <- function(x) gsub("([A-Z])", perl = TRUE, "\\L\\1", gsub("(?!^[A-Z])([A-Z])", perl = TRUE, "_\\L\\1", x))
> name_format("PaymentDate")
[1] "payment_date"
> name_format("AccountsOnFile")
[1] "accounts_on_file"
> name_format("LastDateOfReturn")
[1] "last_date_of_return"

Code demo

R gsub regex Pascal Case to Camel Case

Question

2 answers

solution1
1 2018-02-12 18:06:50

solution2
1 2018-02-12 18:18:55

R gsub regex Pascal Case to Camel Case

Question

2 answers

solution1 1 2018-02-12 18:06:50

solution2 1 2018-02-12 18:18:55

solution1
1 2018-02-12 18:06:50

solution2
1 2018-02-12 18:18:55