简体   繁体   中英

How to remove a pattern from rows in a dataframe in R?

My data has rows that contain institutes with email addresses usually at the end. I want to remove only the email ads and keep the institutes (eg remove hello@canada).

df <- data.frame(institute = c(
"Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada",
"Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada. Electronic address: hello@canada",
"Aix-Marseille Universit.., Inserm, TAGC UMR S1090, 13288 Marseille, France. name@inserm",
"Applied Biological Sciences Program, Chulabhorn Graduate Institute, Bangkok, Thailand Laboratory of Biochemistry, Chulabhorn Research Institute, Bangkok, Thailand",
"Applied Biological Sciences Program, Chulabhorn Graduate Institute, Bangkok, Thailand Laboratory of Biochemistry, Chulabhorn Research Institute, Bangkok, Thailand emailX@yahoo.com"))

My goal is to be able to count the same institutes as one, since in the format above, the email addresses make the rows distinct.

I tried the code below for the first institute, but it didn't remove the complete email address.

a <- "Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada. Electronic address: hello@canada"
gsub("[^.*?]@.*", "\\1", a)
# [1] "Air Quality Processes Research Section, Environment and Climate Change Canada, Toronto, Ontario, M3H 5T4, Canada. Electronic address: hell"

You could use something like this:

df$clean_institute <- trimws(gsub('\\w+@.*$|Electronic address:|email address:', 
                                  '', df$institute))

This removes a word before '@' , '@' and everything after it. Apart from that it also removes words like 'Electronic address:' and 'email address:' .

then use table to count

table(df$clean_institute)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM