簡體   English   中英

R 中的清潔值

[英]Cleaning Values in R

如何刪除羅馬數字I/II/III 、圓括號() 、圓括號(xyz)中的任何內容、破折號- 、分號; , 和 Grades Grade 21來自這個 dataframe 中的字符?

#Original dataframe
Jobs <- c("Social Worker I (Child Welfare Services), Grade 21", "Engineer I/II/III, Grade 19/22/25", "Legislative Attorney; Grade 32")
df <- data.frame(Jobs)
df

Dataframe 看起來像這樣:

#dataframe
Jobs <- c("Social Worker", "Engineer", "Legislative Attorney")
df1 <- data.frame(Jobs)
df1

您可以使用正則表達式刪除匹配的子字符串:

library(tidyverse)

Jobs <- c("Social Worker I (Child Welfare Services), Grade 21", "Engineer I/II/III, Grade 19/22/25", "Legislative Attorney; Grade 32")
df <- data.frame(Jobs)

df %>%
  mutate(Jobs = Jobs %>%
    str_remove_all("I|II|III|Grade [0-9/]+|[-;]") %>%
    str_remove_all("[/,]") %>%
    str_remove_all("[(][^(]+[)]") %>%
    str_trim())
#>                   Jobs
#> 1        Social Worker
#> 2             Engineer
#> 3 Legislative Attorney

reprex package (v2.0.0) 創建於 2022-05-05

我不是正則表達式專家,但我希望以下gsub選項可以提供幫助

> trimws(gsub("(\\b((I+)/?)+\\b)|\\(.*?\\)|[-;,]|(Grade\\s\\S+)", "", Jobs))
[1] "Social Worker"        "Engineer"             "Legislative Attorney"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM