[英]Removing strings from rows for specific words
我的數據看起來像:
Weather
<chr>
1 Snow Low clouds
2 Snow Cloudy
3 Drizzle Fog
4 Thundershowers Partly cloudy
5 Thunderstorms More clouds than sun
6 Sprinkles Partly cloudy
7 Heavy rain Broken clouds
8 Light rain Partly cloudy
我正在嘗試使用mutate
來刪除一些文本。 例如,我希望上面看起來像:
Weather
<chr>
1 Snow
2 Snow
3 Drizzle
4 Thundershowers
5 Thunderstorms More clouds than sun
6 Sprinkles Partly cloudy
7 Heavy rain
8 Light rain
所以我想刪除一些特定單詞之后的文本。 如果我有以下向量:
c("Snow", "Drizzle", "Heavy rain", "Light rain")
刪除這些后面的文字。 但是,我不想grep
諸如Cloudy
、 Fog
詞,因為它們在數據中作為自己的行出現,但是諸如Snow Light fog
類的東西可以分解為Snow
。
數據:
d <- structure(list(Weather = c("Snow Low clouds", "Snow Cloudy",
"Drizzle Fog", "Thundershowers Partly cloudy", "Thunderstorms More clouds than sun",
"Sprinkles Partly cloudy", "Heavy rain Broken clouds", "Light rain Partly cloudy",
"Rain showers Passing clouds", "Thundershowers Scattered clouds",
"Thundershowers Passing clouds", "Light snow Overcast", "Snow Light fog",
"Drizzle Broken clouds", "Light rain Fog", "Cloudy", "Thunderstorms Partly cloudy",
"Heavy rain More clouds than sun", "Partly cloudy", NA)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
您可以在此處采用的一般方法是構建所有目標術語的正則表達式交替。 然后,匹配這些術語后跟任何內容直到輸入結束,並僅替換為術語。
terms <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
regex <- paste0("\\b(", paste(terms, collapse="|"), ")\\b")
sub(paste0(regex, "\\s.*"), "\\1", d$Weather)
[1] "Snow" "Snow"
[3] "Drizzle" "Thundershowers Partly cloudy"
[5] "Thunderstorms More clouds than sun" "Sprinkles Partly cloudy"
[7] "Heavy rain" "Light rain"
[9] "Rain showers Passing clouds" "Thundershowers Scattered clouds"
[11] "Thundershowers Passing clouds" "Light snow Overcast"
[13] "Snow" "Drizzle"
[15] "Light rain" "Cloudy"
[17] "Thunderstorms Partly cloudy" "Heavy rain"
[19] "Partly cloudy" NA
請注意,我的輸出與您的預期輸出不完全一致,但是您再次沒有在建議的向量中包含所有目標詞。
我使用的正則表達式是:
\b(Snow|Drizzle|Heavy rain|Light rain)\b
這里的技巧是上述交替也是一個捕獲組,讓我們可以輕松地用您想要的術語替換匹配項。 您可以為此添加更多術語以獲得所需的輸出。
v <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
pat <- paste0(v,collapse = "|")
unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
以至於
> unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
[1] "Snow" "Snow" "Drizzle" "Heavy rain" "Light rain" "Snow"
[7] "Drizzle" "Light rain" "Heavy rain"
d
,則可以使用以下代碼:d <- within(d,X <- ifelse(grepl(pat,Weather),unlist(regmatches(Weather,gregexpr(pat,Weather))),NA))
以至於
> d
# A tibble: 20 x 2
Weather X
<chr> <chr>
1 Snow Low clouds Snow
2 Snow Cloudy Snow
3 Drizzle Fog Drizzle
4 Thundershowers Partly cloudy NA
5 Thunderstorms More clouds than sun NA
6 Sprinkles Partly cloudy NA
7 Heavy rain Broken clouds Drizzle
8 Light rain Partly cloudy Light rain
9 Rain showers Passing clouds NA
10 Thundershowers Scattered clouds NA
11 Thundershowers Passing clouds NA
12 Light snow Overcast NA
13 Snow Light fog Heavy rain
14 Drizzle Broken clouds Light rain
15 Light rain Fog Snow
16 Cloudy NA
17 Thunderstorms Partly cloudy NA
18 Heavy rain More clouds than sun Heavy rain
19 Partly cloudy NA
20 NA NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.