简体   繁体   中英

changing all values in one column in a filtered data.frame in R

I have a very messy data frame, with one column with values that are understandable to humans but not to computers, a bit like the one below.

           "colour"=c("re d", ", red", "re-d","green", "gre, en", ", gre-en",  "blu e", "green", ", blue", "bl ue"))

I can filter the df with str_detect

df %>% filter(str_detect(tolower(colour), pattern = "gr")) 

But I want to rename all the filtered results to the same value so I can wrangle it.
Any suggestions?
I tried to separate with pattern but was unsuccessful.

EDIT: Not all . and spaces are unnecessary in the df that I am working with. Lets say that the correct way of writing green in the made up df is "gr. een".

Wanted result with faked spelling of colours just to get an idea:

id     colour
1      r. ed
2      r. ed
3      r. ed
4      gr. een
6      gr. een
7      gr. een
8      blu. e
9      gr. een           
10     blu. e

You can use mgsub to replace multiple words with multiple patterns:

               "colour"=c("re d", ", red", "re-d","green", "gre, en", 
                          ", gre-en",  "blu e", "green", ", blue", "bl ue"))


df$colour = mgsub(df$colour, 
                  pattern =  c(".*gr.*", ".*re.*", ".*bl.*"), 
                  replacement =  c("gr. een", "r. ed", "blu. e"), fixed = F)


#     id  colour
# 1   1   r. ed
# 2   2   r. ed
# 3   3   r. ed
# 4   4 gr. een
# 5   5 gr. een
# 6   6 gr. een
# 7   7  blu. e
# 8   8 gr. een
# 9   9  blu. e
# 10 10  blu. e

Here are two solution for pre-processing your data, one is given in the comments already:

df %>% 
  mutate(colour2 = gsub("[^A-z]", "", colour))%>%
  filter(str_detect(tolower(colour2), pattern = "green")) 

Making the inverse thinking, you can use stringr to extract the letters


df %>% 
  mutate(colour2 = sapply(str_extract_all(df$colour,"[A-z]"),function(vec){paste0(vec,collapse = "")}))%>%
  filter(str_detect(tolower(colour2), pattern = "green")) 

Your selection will be more robust, and the column already renamed.

  id   colour colour2
1  4    green   green
2  5  gre, en   green
3  6 , gre-en   green
4  8    green   green

If you just want to rename all of the filtered results, how about:

               "colour"=c("re d", ", red", "re-d","green", "gre, en", ", gre-en",  "blu e", "green", ", blue", "bl ue"))

df[str_detect(tolower(df[,"colour"]), pattern = "gr"), "colour"] <- "green"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM