[英]Cleaning Values in R
如何刪除羅馬數字I/II/III
、圓括號()
、圓括號(xyz)
中的任何內容、破折號-
、分號;
, 和 Grades Grade 21
來自這個 dataframe 中的字符?
#Original dataframe
Jobs <- c("Social Worker I (Child Welfare Services), Grade 21", "Engineer I/II/III, Grade 19/22/25", "Legislative Attorney; Grade 32")
df <- data.frame(Jobs)
df
Dataframe 看起來像這樣:
#dataframe
Jobs <- c("Social Worker", "Engineer", "Legislative Attorney")
df1 <- data.frame(Jobs)
df1
您可以使用正則表達式刪除匹配的子字符串:
library(tidyverse)
Jobs <- c("Social Worker I (Child Welfare Services), Grade 21", "Engineer I/II/III, Grade 19/22/25", "Legislative Attorney; Grade 32")
df <- data.frame(Jobs)
df %>%
mutate(Jobs = Jobs %>%
str_remove_all("I|II|III|Grade [0-9/]+|[-;]") %>%
str_remove_all("[/,]") %>%
str_remove_all("[(][^(]+[)]") %>%
str_trim())
#> Jobs
#> 1 Social Worker
#> 2 Engineer
#> 3 Legislative Attorney
由reprex package (v2.0.0) 創建於 2022-05-05
我不是正則表達式專家,但我希望以下gsub
選項可以提供幫助
> trimws(gsub("(\\b((I+)/?)+\\b)|\\(.*?\\)|[-;,]|(Grade\\s\\S+)", "", Jobs))
[1] "Social Worker" "Engineer" "Legislative Attorney"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.