简体   繁体   English

如何删除 R 列中的前几个字符?

[英]How to remove the first few characters in a column in R?

My data (csv file) has a column that contains uninformative characters (eg special characters, random lowercase letters), and I want to remove them.我的数据(csv 文件)有一列包含无意义的字符(例如特殊字符、随机小写字母),我想删除它们。

df <- data.frame(Affiliation = c(". Biotechnology Centre, Malaysia Agricultural Research and Development Institute (MARDI), Serdang, Malaysia","**Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Pulau Pinang, Malaysia","aas Massachusetts General Hospital and Harvard Medical School, Center for Human Genetic Research and Department of Neurology , Boston , MA , USA","ac Albert Einstein College of Medicine , Department of Pathology , Bronx , NY , USA"))

The number of characters I want to remove (eg ".","**","aas","ac") per line is indefinite as shown above.每行我要删除的字符数(例如“.”、“**”、“aas”、“ac”)是不确定的,如上所示。

Expected output:预期 output:

df <- data.frame(Affiliation = c("Biotechnology Centre, Malaysia Agricultural Research and Development Institute (MARDI), Serdang, Malaysia","Institute for Research in Molecular Medicine (INFORMM), Universiti Sains Malaysia, Pulau Pinang, Malaysia","Massachusetts General Hospital and Harvard Medical School, Center for Human Genetic Research and Department of Neurology , Boston , MA , USA","Albert Einstein College of Medicine , Department of Pathology , Bronx , NY , USA"))

I was thinking of using dplyr's mutate function, but I'm not sure how to go about it.我正在考虑使用 dplyr 的变异 function,但我不确定如何使用 go。

If we assume that the valid text starts from the first uppercase onwards, the following works:如果我们假设有效文本从第一个大写字母开始,则以下工作:

library(tidyverse)
df %>% 
  mutate(Affiliation = str_extract(Affiliation, "[:upper:].+"))

Base R regex solution:基础 R 正则表达式解决方案:

df$cleaned_str <- gsub("^\\w+ |^\\*+|^\\. ", "", df$Affiliation)

Tidyverse regex solution: Tidyverse 正则表达式解决方案:

library(tidyverse)
df %>% 
  mutate(Affiliation = str_replace(Affiliation, "^\\w+ |^\\*+|^\\. ", ""))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM