[英]Removing strings from rows for specific words
我的数据看起来像:
Weather
<chr>
1 Snow Low clouds
2 Snow Cloudy
3 Drizzle Fog
4 Thundershowers Partly cloudy
5 Thunderstorms More clouds than sun
6 Sprinkles Partly cloudy
7 Heavy rain Broken clouds
8 Light rain Partly cloudy
我正在尝试使用mutate
来删除一些文本。 例如,我希望上面看起来像:
Weather
<chr>
1 Snow
2 Snow
3 Drizzle
4 Thundershowers
5 Thunderstorms More clouds than sun
6 Sprinkles Partly cloudy
7 Heavy rain
8 Light rain
所以我想删除一些特定单词之后的文本。 如果我有以下向量:
c("Snow", "Drizzle", "Heavy rain", "Light rain")
删除这些后面的文字。 但是,我不想grep
诸如Cloudy
、 Fog
词,因为它们在数据中作为自己的行出现,但是诸如Snow Light fog
类的东西可以分解为Snow
。
数据:
d <- structure(list(Weather = c("Snow Low clouds", "Snow Cloudy",
"Drizzle Fog", "Thundershowers Partly cloudy", "Thunderstorms More clouds than sun",
"Sprinkles Partly cloudy", "Heavy rain Broken clouds", "Light rain Partly cloudy",
"Rain showers Passing clouds", "Thundershowers Scattered clouds",
"Thundershowers Passing clouds", "Light snow Overcast", "Snow Light fog",
"Drizzle Broken clouds", "Light rain Fog", "Cloudy", "Thunderstorms Partly cloudy",
"Heavy rain More clouds than sun", "Partly cloudy", NA)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
您可以在此处采用的一般方法是构建所有目标术语的正则表达式交替。 然后,匹配这些术语后跟任何内容直到输入结束,并仅替换为术语。
terms <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
regex <- paste0("\\b(", paste(terms, collapse="|"), ")\\b")
sub(paste0(regex, "\\s.*"), "\\1", d$Weather)
[1] "Snow" "Snow"
[3] "Drizzle" "Thundershowers Partly cloudy"
[5] "Thunderstorms More clouds than sun" "Sprinkles Partly cloudy"
[7] "Heavy rain" "Light rain"
[9] "Rain showers Passing clouds" "Thundershowers Scattered clouds"
[11] "Thundershowers Passing clouds" "Light snow Overcast"
[13] "Snow" "Drizzle"
[15] "Light rain" "Cloudy"
[17] "Thunderstorms Partly cloudy" "Heavy rain"
[19] "Partly cloudy" NA
请注意,我的输出与您的预期输出不完全一致,但是您再次没有在建议的向量中包含所有目标词。
我使用的正则表达式是:
\b(Snow|Drizzle|Heavy rain|Light rain)\b
这里的技巧是上述交替也是一个捕获组,让我们可以轻松地用您想要的术语替换匹配项。 您可以为此添加更多术语以获得所需的输出。
v <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
pat <- paste0(v,collapse = "|")
unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
以至于
> unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
[1] "Snow" "Snow" "Drizzle" "Heavy rain" "Light rain" "Snow"
[7] "Drizzle" "Light rain" "Heavy rain"
d
,则可以使用以下代码:d <- within(d,X <- ifelse(grepl(pat,Weather),unlist(regmatches(Weather,gregexpr(pat,Weather))),NA))
以至于
> d
# A tibble: 20 x 2
Weather X
<chr> <chr>
1 Snow Low clouds Snow
2 Snow Cloudy Snow
3 Drizzle Fog Drizzle
4 Thundershowers Partly cloudy NA
5 Thunderstorms More clouds than sun NA
6 Sprinkles Partly cloudy NA
7 Heavy rain Broken clouds Drizzle
8 Light rain Partly cloudy Light rain
9 Rain showers Passing clouds NA
10 Thundershowers Scattered clouds NA
11 Thundershowers Passing clouds NA
12 Light snow Overcast NA
13 Snow Light fog Heavy rain
14 Drizzle Broken clouds Light rain
15 Light rain Fog Snow
16 Cloudy NA
17 Thunderstorms Partly cloudy NA
18 Heavy rain More clouds than sun Heavy rain
19 Partly cloudy NA
20 NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.