簡體   English   中英

如何刪除 R 列中的前幾個字符?

[英]How to remove the first few characters in a column in R?

我的數據(csv 文件)有一列包含無意義的字符(例如特殊字符、隨機小寫字母),我想刪除它們。

df <- data.frame(Affiliation = c(". Biotechnology Centre, Malaysia Agricultural Research and Development Institute (MARDI), Serdang, Malaysia","**Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Pulau Pinang, Malaysia","aas Massachusetts General Hospital and Harvard Medical School, Center for Human Genetic Research and Department of Neurology , Boston , MA , USA","ac Albert Einstein College of Medicine , Department of Pathology , Bronx , NY , USA"))

每行我要刪除的字符數(例如“.”、“**”、“aas”、“ac”)是不確定的,如上所示。

預期 output:

df <- data.frame(Affiliation = c("Biotechnology Centre, Malaysia Agricultural Research and Development Institute (MARDI), Serdang, Malaysia","Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Pulau Pinang, Malaysia","Massachusetts General Hospital and Harvard Medical School, Center for Human Genetic Research and Department of Neurology , Boston , MA , USA","Albert Einstein College of Medicine , Department of Pathology , Bronx , NY , USA"))

我正在考慮使用 dplyr 的變異 function,但我不確定如何使用 go。

如果我們假設有效文本從第一個大寫字母開始,則以下工作:

library(tidyverse)
df %>% 
  mutate(Affiliation = str_extract(Affiliation, "[:upper:].+"))

基礎 R 正則表達式解決方案:

df$cleaned_str <- gsub("^\\w+ |^\\*+|^\\. ", "", df$Affiliation)

Tidyverse 正則表達式解決方案:

library(tidyverse)
df %>% 
  mutate(Affiliation = str_replace(Affiliation, "^\\w+ |^\\*+|^\\. ", ""))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM