[英]Removing Special Characters and Numbers for a column in a data frame
我有一個數據框:
dput(Data1)
structure(list(Emp.ID = c(182038L, 191854L), Project.Acquired.Skill = structure(c(2L,
1L), .Label = c("Architecting (10),Cognos TM1 (4),Support Function (3)",
"SAS (76),SAS Analytics (76),SAS BI (76),SAS data modeling tool (63),ClearCase (18),SQL (18),SQL Server (18),SQL SERVER 2000 (18),SQL SERVER 2005 (18),Excel (16),Oracle (16),AS400 (10)"
), class = "factor")), .Names = c("Emp.ID", "Project.Acquired.Skill"
), class = "data.frame", row.names = c(NA, -2L))
str(Data1)
'data.frame': 2 obs. of 2 variables:
$ Emp.ID : int 182038 191854
$ Project.Acquired.Skill: Factor w/ 2 levels "Architecting (10),Cognos TM1 (4),Support Function (3)",..: 2 1
我有一列是具有這樣的值的因子,例如Architecting (10),Cognos TM1 (4),Support Function (3)
並且我需要剝離數字(0-9),WhiteSpace和括號()以獲得Architecting,Cognos TM1,Support Function
。 我正面臨問題,因為這被編碼為因素。
我的輸出應如下所示
Emp ID Project Acquired Skill
182038 SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER 2000,SQL SERVER 2005,Excel,Oracle,AS400
191854 Architecting,Cognos TM1,Support Function
在gsub
使用字符類regexp:
transform(Data1, Project.Acquired.Skill=gsub("\\s[0-9()]+","",Project.Acquired.Skill))
Emp.ID
1 182038
2 191854
Project.Acquired.Skill
1 SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER,SQL SERVER,Excel,Oracle,AS400
2 Architecting,Cognos TM1,Support Function
(data1[,2] <- gsub("\\s\\(\\d+\\)", "", data1[,2]))
# [1] "SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER 2000,SQL SERVER 2005,Excel,Oracle,AS400"
# [2] "Architecting,Cognos TM1,Support Function"
library(qdap)
gsub(" ,", ",", strip(Data1[, 2], char.keep=",", lower=F))
## [1] "SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER ,SQL SERVER ,Excel,Oracle,AS"
## [2] "Architecting,Cognos TM,Support Function"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.