简体   繁体   中英

changing all values in one column in a filtered data.frame in R

I have a very messy data frame, with one column with values that are understandable to humans but not to computers, a bit like the one below.

df<-data.frame("id"=c(1:10), 
           "colour"=c("re d", ", red", "re-d","green", "gre, en", ", gre-en",  "blu e", "green", ", blue", "bl ue"))

I can filter the df with str_detect

df %>% filter(str_detect(tolower(colour), pattern = "gr")) 

But I want to rename all the filtered results to the same value so I can wrangle it.
Any suggestions?
I tried to separate with pattern but was unsuccessful.

EDIT: Not all . and spaces are unnecessary in the df that I am working with. Lets say that the correct way of writing green in the made up df is "gr. een".

EDIT2:
Wanted result with faked spelling of colours just to get an idea:

id     colour
1      r. ed
2      r. ed
3      r. ed
4      gr. een
6      gr. een
7      gr. een
8      blu. e
9      gr. een           
10     blu. e

You can use mgsub to replace multiple words with multiple patterns:

df<-data.frame("id"=c(1:10), 
               "colour"=c("re d", ", red", "re-d","green", "gre, en", 
                          ", gre-en",  "blu e", "green", ", blue", "bl ue"))

library(textclean)

df$colour = mgsub(df$colour, 
                  pattern =  c(".*gr.*", ".*re.*", ".*bl.*"), 
                  replacement =  c("gr. een", "r. ed", "blu. e"), fixed = F)

df

#     id  colour
# 1   1   r. ed
# 2   2   r. ed
# 3   3   r. ed
# 4   4 gr. een
# 5   5 gr. een
# 6   6 gr. een
# 7   7  blu. e
# 8   8 gr. een
# 9   9  blu. e
# 10 10  blu. e

Here are two solution for pre-processing your data, one is given in the comments already:

library(dplyr)
df %>% 
  mutate(colour2 = gsub("[^A-z]", "", colour))%>%
  filter(str_detect(tolower(colour2), pattern = "green")) 

Making the inverse thinking, you can use stringr to extract the letters

library(stringr)

df %>% 
  mutate(colour2 = sapply(str_extract_all(df$colour,"[A-z]"),function(vec){paste0(vec,collapse = "")}))%>%
  filter(str_detect(tolower(colour2), pattern = "green")) 

Your selection will be more robust, and the column already renamed.

  id   colour colour2
1  4    green   green
2  5  gre, en   green
3  6 , gre-en   green
4  8    green   green

If you just want to rename all of the filtered results, how about:

df<-data.frame("id"=c(1:10), 
               "colour"=c("re d", ", red", "re-d","green", "gre, en", ", gre-en",  "blu e", "green", ", blue", "bl ue"))

library(stringr)                                                         
df[str_detect(tolower(df[,"colour"]), pattern = "gr"), "colour"] <- "green"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM