My Data looks like:
Weather
<chr>
1 Snow Low clouds
2 Snow Cloudy
3 Drizzle Fog
4 Thundershowers Partly cloudy
5 Thunderstorms More clouds than sun
6 Sprinkles Partly cloudy
7 Heavy rain Broken clouds
8 Light rain Partly cloudy
I am trying to use mutate
to remove some text. For example I would like the above to look like:
Weather
<chr>
1 Snow
2 Snow
3 Drizzle
4 Thundershowers
5 Thunderstorms More clouds than sun
6 Sprinkles Partly cloudy
7 Heavy rain
8 Light rain
So I would like to remove the text after some specific words. If I have a vector of the following:
c("Snow", "Drizzle", "Heavy rain", "Light rain")
Remove the text after these. However I do not want to grep
words such as Cloudy
, Fog
since they occure as their own row in the data but something like Snow Light fog
can be cut down to Snow
.
Data:
d <- structure(list(Weather = c("Snow Low clouds", "Snow Cloudy",
"Drizzle Fog", "Thundershowers Partly cloudy", "Thunderstorms More clouds than sun",
"Sprinkles Partly cloudy", "Heavy rain Broken clouds", "Light rain Partly cloudy",
"Rain showers Passing clouds", "Thundershowers Scattered clouds",
"Thundershowers Passing clouds", "Light snow Overcast", "Snow Light fog",
"Drizzle Broken clouds", "Light rain Fog", "Cloudy", "Thunderstorms Partly cloudy",
"Heavy rain More clouds than sun", "Partly cloudy", NA)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
A general approach you can take here is to build a regex alternation of all target terms. Then, match those terms followed by anything up until the end of the input, and replace with just the term.
terms <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
regex <- paste0("\\b(", paste(terms, collapse="|"), ")\\b")
sub(paste0(regex, "\\s.*"), "\\1", d$Weather)
[1] "Snow" "Snow"
[3] "Drizzle" "Thundershowers Partly cloudy"
[5] "Thunderstorms More clouds than sun" "Sprinkles Partly cloudy"
[7] "Heavy rain" "Light rain"
[9] "Rain showers Passing clouds" "Thundershowers Scattered clouds"
[11] "Thundershowers Passing clouds" "Light snow Overcast"
[13] "Snow" "Drizzle"
[15] "Light rain" "Cloudy"
[17] "Thunderstorms Partly cloudy" "Heavy rain"
[19] "Partly cloudy" NA
Note that my output does not line up exactly with your expected output, but then again you did not include all target words in the suggested vector.
The regex I used was:
\b(Snow|Drizzle|Heavy rain|Light rain)\b
The trick here is that the above alternation is also a capture group, letting us easily replace the match with just the term you want. You may add more terms to this to get the desired output.
v <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
pat <- paste0(v,collapse = "|")
unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
such that
> unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
[1] "Snow" "Snow" "Drizzle" "Heavy rain" "Light rain" "Snow"
[7] "Drizzle" "Light rain" "Heavy rain"
d
in a new column, then you can use the following code:d <- within(d,X <- ifelse(grepl(pat,Weather),unlist(regmatches(Weather,gregexpr(pat,Weather))),NA))
such that
> d
# A tibble: 20 x 2
Weather X
<chr> <chr>
1 Snow Low clouds Snow
2 Snow Cloudy Snow
3 Drizzle Fog Drizzle
4 Thundershowers Partly cloudy NA
5 Thunderstorms More clouds than sun NA
6 Sprinkles Partly cloudy NA
7 Heavy rain Broken clouds Drizzle
8 Light rain Partly cloudy Light rain
9 Rain showers Passing clouds NA
10 Thundershowers Scattered clouds NA
11 Thundershowers Passing clouds NA
12 Light snow Overcast NA
13 Snow Light fog Heavy rain
14 Drizzle Broken clouds Light rain
15 Light rain Fog Snow
16 Cloudy NA
17 Thunderstorms Partly cloudy NA
18 Heavy rain More clouds than sun Heavy rain
19 Partly cloudy NA
20 NA NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.