繁体   English   中英

清理多个DataFrame的列名

[英]Clean Column names of multiple DataFrames

我想清理多个数据帧的列名,而不是简单地一次只执行一次。 见下面的代码。

#Create data frame with basic data
patientID <- c(1, 2, 3, 4)
AdmDate <- as.POSIXct(c('2010-10-11','2008-3-25','2016-4-23','2011-6-12'))
diabetes <- c("Type1", "Type2", "Type1", "Type2")
`p-status` <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(`patient ID`, `Adm Date`, diabetes, `p-status`)
patientdata

#Find and replace spaces in column names 
names(patientdata) <- str_replace_all(names(patientdata)," *",'')

#Find and replace hyphen in column name
names(patientdata) <- str_replace_all(names(patientdata),"-",'')

names(patientdata)

我需要在至少两个不同的数据帧上执行这些相同的过程(在列名和连字符中替换空格/句点),但我无法向str_replace_all提供列名称向量。 执行此操作的常规方法是每个数据帧至少需要3个不同的str_replace所有语句。 此外,我正在使用的数据框的命名方式不同(如order_table和sales_table)。 关于如何用更少的代码行做到这一点的想法?

以下是一个示例分步过程:

#Create data frame with basic data
`patient ID` <- c(1, 2, 3, 4)
`Adm Date` <- as.POSIXct(c('2010-10-11','2008-3-25','2016-4-23','2011-6-12'))
diabetes <- c("Type1", "Type2", "Type1", "Type2")
`p-status` <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(`patient ID`, `Adm Date`, diabetes, `p-status`, check.names=FALSE)

#Create copies
patientdata2 <- patientdata3 <- patientdata4 <- patientdata

#Make list with all data frames
lst <- mget(ls(pattern="^patientdata"))

#Create Single Function to house all operations

nameChange <- function(df) {
  names(df) <- str_replace_all(names(df)," *",'')
  names(df) <- str_replace_all(names(df),"-",'')
  return(df)
}

#Iterate over all data frames
library(stringr)
lapply(lst, nameChange)
# $patientdata
#   patientID    AdmDate diabetes   pstatus
# 1         1 2010-10-11    Type1      Poor
# 2         2 2008-03-25    Type2  Improved
# 3         3 2016-04-23    Type1 Excellent
# 4         4 2011-06-12    Type2      Poor
# 
# $patientdata2
#   patientID    AdmDate diabetes   pstatus
# 1         1 2010-10-11    Type1      Poor
# 2         2 2008-03-25    Type2  Improved
# 3         3 2016-04-23    Type1 Excellent
# 4         4 2011-06-12    Type2      Poor
# 
# $patientdata3
#   patientID    AdmDate diabetes   pstatus
# 1         1 2010-10-11    Type1      Poor
# 2         2 2008-03-25    Type2  Improved
# 3         3 2016-04-23    Type1 Excellent
# 4         4 2011-06-12    Type2      Poor

如果首选,我们还可以避免创建列表:

patientdata <- nameChange(patientdata)
patientdata2 <- nameChange(patientdata2)
patientdata3 <- nameChange(patientdata3)

使用setnamesdata.table非常方便。

此外,你的正则表达式似乎很简单,你可以将它们组合成一个如( *|-)

例:

df1 <- data.frame(a1=c(1,2,3),b2 = c(4,5,6), c3 = c(7,8,9))
df2 <- copy(df1)
df3 <- copy(df1)

library(data.table)
for (df_name in c("df1","df2","df3")){
    setnames(get(df_name), gsub("a|b|c","whatever",colnames(get(df_name))))
}

将数据集放入list后,我们可以使用mgsubqdap

library(qdap)
lst <- mget(ls(pattern="^patientdata"))
lst1 <- lapply(lst, function(x) setNames(x, mgsub(c(" ", "-"), c("", ""), names(x))))

或者我们可以使用gsub

lst1 <- lapply(lst, function(x) setNames(x, gsub("[- ]+", "", names(x))))
names(lst1[[1]])
#[1] "patientID" "AdmDate"   "diabetes"  "pstatus"  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM