优雅的R功能：由句点分隔的混合表壳，以下划线分开的小写和/或驼色表壳

Question

I often get datasets from collaborators that have non-consistent naming of variables/columns in the dataset. 我经常从协作者处获得数据集，这些数据集在数据集中具有不一致的变量/列命名。 One of my first tasks is to rename them, and I want a solution completely within R to do so. 我的首要任务之一是重命名它们，我想在R内部完全解决这个问题。

as.Given <- c("ICUDays","SexCode","MAX_of_MLD","Age.Group")

underscore_lowercase <- c("icu_days", "sex_code", "max_of_mld","age_group")

camelCase <- c("icuDays", "sexCode", "maxOfMld", "ageGroup")

Given the different opinions about naming conventions and in the spirit of what was proposed in Python , what ways are there to go from as.Given to underscore_lowercase and/or camelCase in a user-specified way in R? 鉴于对命名约定的不同看法以及Python中提出的内容的精神，有什么方法可以在R中以用户指定的方式从as.Given转到underscore_lowercase和/或camelCase ？

Edit: Also found this related post in R / regex , especially the answer of @rengis. 编辑：还在R / regex中找到了这个相关的帖子，特别是@rengis的答案。

Answer 1

Try this. 试试这个。 These at least work on the examples given: 这些至少可以用于给出的例子：

toUnderscore <- function(x) {
  x2 <- gsub("([A-Za-z])([A-Z])([a-z])", "\\1_\\2\\3", x)
  x3 <- gsub(".", "_", x2, fixed = TRUE)
  x4 <- gsub("([a-z])([A-Z])", "\\1_\\2", x3)
  x5 <- tolower(x4)
  x5
}

underscore2camel <- function(x) {
  gsub("_(.)", "\\U\\1", x, perl = TRUE)
}

#######################################################
# test
#######################################################

u <- toUnderscore(as.Given)
u
## [1] "icu_days"   "sex_code"   "max_of_mld" "age_group" 

underscore2camel(u)
## [1] "icuDays"  "sexCode"  "maxOfMld" "ageGroup"

Answer 2

To get the second underscore_lowercase ( g ) and camelCase ( x ) strings, 要获得第二个underscore_lowercase （ g ）和camelCase （ x ）字符串，

> as.Given <- c("ICUDays","SexCode","MAX_of_MLD","Age.Group")
> r <- gsub("[^\\w]", "", as.Given, perl=T)
> f <- gsub("^.*?_.*$(*SKIP)(*F)|(?:[^A-Z]+|[A-Z_]+?)\\K([A-Z])(?=[A-Z_]+$|[a-z_]+$)", "_\\1", r,perl=T)
> g <- tolower(f)
> g
[1] "icu_days"   "sex_code"   "max_of_mld" "age_group"
> x <- gsub("_([a-z])", "\\U\\1", g,perl=T)
> x
[1] "icuDays"  "sexCode"  "maxOfMld" "ageGroup"

UPDATE UPDATE

> as.Given = c("CRMLevel1Code", "MAX_of_RhD", "MAX_Of_MCa", "MAX_of_NCCexclusion","ICUDays","SexCode","MAX_of_MLD","Age.Group","admitRom")
> r <- gsub("[^\\w]", "", as.Given, perl=T)
> f <- gsub("(?:[^A-Z]|^)[A-Z][A-Z][A-Z]\\K(?=[a-zA-Z])|(?=\\d)|^[A-Z][a-z]+\\K(?=[A-Z][a-z]+$)|(?<=\\d)(?=[A-Za-z])|^[a-z]+\\K(?=[A-Z][a-z]+$)", "_", r, perl=T)
> underscore_lowercase <- tolower(f)
> underscore_lowercase
[1] "crm_level_1_code"     "max_of_rhd"           "max_of_mca"          
[4] "max_of_ncc_exclusion" "icu_days"             "sex_code"            
[7] "max_of_mld"           "age_group"            "admit_rom"           
> camelCase <- gsub("_([a-z]|\d)", "\\U\\1", underscore_lowercase, perl=T)
Error: '\d' is an unrecognized escape in character string starting ""_([a-z]|\d"
> camelCase <- gsub("_([a-z]|\\d)", "\\U\\1", underscore_lowercase, perl=T)
> camelCase
[1] "crmLevel1Code"     "maxOfRhd"          "maxOfMca"         
[4] "maxOfNccExclusion" "icuDays"           "sexCode"          
[7] "maxOfMld"          "ageGroup"          "admitRom"

Answer 3

Based off your as.Given vector and adding admitROM to the list, this will do the trick. 基于你的as.Given向量并将admitROM添加到列表中，这将admitROM 。

as.Given <- c('ICUDays', 'SexCode', 'MAX_of_MLD', 'Age.Group', 'admitROM')
invertd <- gsub('(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|\\.', '_', as.Given, perl=T)
toscore <- tolower(invertd)
## [1] "icu_days"   "sex_code"   "max_of_mld" "age_group"  "admit_rom" 
tocamel <- gsub("_([a-z])", "\\U\\1", toscore, perl=T)
## [1] "icuDays"  "sexCode"  "maxOfMld" "ageGroup" "admitRom"

Answer 4

This should do the trick: 这应该做的伎俩：

install.packages("snakecase")
library(snakecase)

to_snake_case(as.Given)
#> [1] "icu_days"   "sex_code"   "max_of_mld" "age_group" 

to_lower_camel_case(as.Given)
#> [1] "icuDays"  "sexCode"  "maxOfMld" "ageGroup"

Githublink to snakecase package: https://github.com/Tazinho/snakecase Githublink to snakecase package： https ： //github.com/Tazinho/snakecase

优雅的R功能：由句点分隔的混合表壳，以下划线分开的小写和/或驼色表壳

问题描述

4 个解决方案

解决方案1
10 已采纳 2014-08-26 11:29:32

解决方案2
4 2014-08-26 11:14:58

解决方案3
3 2014-08-26 15:17:32

解决方案4
2 2017-03-25 21:44:59

优雅的R功能：由句点分隔的混合表壳，以下划线分开的小写和/或驼色表壳

问题描述

4 个解决方案

解决方案1 10 已采纳 2014-08-26 11:29:32

解决方案2 4 2014-08-26 11:14:58

解决方案3 3 2014-08-26 15:17:32

解决方案4 2 2017-03-25 21:44:59

解决方案1
10 已采纳 2014-08-26 11:29:32

解决方案2
4 2014-08-26 11:14:58

解决方案3
3 2014-08-26 15:17:32

解决方案4
2 2017-03-25 21:44:59