I want to write a gsub function using R regexes to replace all capital letters in my string with underscore and the lower case variant. In a seperate gsub, I want to replace the first letter with the lowercase variant. The function should do something like this:
pascal_to_camel("PaymentDate") -> "payment_date"
pascal_to_camel("AccountsOnFile") -> "accounts_on_file"
pascal_to_camel("LastDateOfReturn") -> "last_date_of_return"
The problem is, I don't know how to tolower
a "\\\\1"
returned by the regex.
I have something like this:
name_format = function(x) gsub("([A-Z])", paste0("_", tolower("\\1")), gsub("^([A-Z])", tolower("\\1"), x))
But it is doing tolower
on the string "\\\\1"
instead of on the matched string.
You may use the following solution (converted from Python, see the Elegant Python function to convert CamelCase to snake_case? post):
> pascal_to_camel <- function(x) tolower(gsub("([a-z0-9])([A-Z])", "\\1_\\2", gsub("(.)([A-Z][a-z]+)", "\\1_\\2", x)))
> pascal_to_camel("PaymentDate")
[1] "payment_date"
> pascal_to_camel("AccountsOnFile")
[1] "accounts_on_file"
> pascal_to_camel("LastDateOfReturn")
[1] "last_date_of_return"
Explanation
gsub("(.)([AZ][az]+)", "\\\\1_\\\\2", x)
is executed first to insert a _
between any char followed with an uppercase ASCII letter followed with 1+ ASCII lowercase letters (the output is marked as y
in the bullet point below) gsub("([a-z0-9])([AZ])", "\\\\1_\\\\2", y)
- inserts _
between a lowercase ASCII letter or a digit and an uppercase ASCII letter (result is defined as z
below) tolower(z)
- turns the whole result to lower case. The same regex with Unicode support ( \\p{Lu}
matches any uppercase Unicode letter and \\p{Ll}
matches any Unicode lowercase letter):
pascal_to_camel_uni <- function(x) {
tolower(gsub("([\\p{Ll}0-9])(\\p{Lu})", "\\1_\\2",
gsub("(.)(\\p{Lu}\\p{Ll}+)", "\\1_\\2", x, perl=TRUE), perl=TRUE))
}
pascal_to_camel_uni("ДеньОплаты")
## => [1] "день_оплаты"
See this online R demo .
Using two regex ([AZ])
and (?!^[AZ])([AZ])
, perl = TRUE
, \\\\L\\\\1
and _\\\\L\\\\1
:
name_format <- function(x) gsub("([A-Z])", perl = TRUE, "\\L\\1", gsub("(?!^[A-Z])([A-Z])", perl = TRUE, "_\\L\\1", x))
> name_format("PaymentDate")
[1] "payment_date"
> name_format("AccountsOnFile")
[1] "accounts_on_file"
> name_format("LastDateOfReturn")
[1] "last_date_of_return"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.