簡體   English   中英

從特定單詞的行中刪除字符串

[英]Removing strings from rows for specific words

我的數據看起來像:

Weather                           
   <chr>                             
 1 Snow Low clouds                   
 2 Snow Cloudy                       
 3 Drizzle Fog                       
 4 Thundershowers Partly cloudy      
 5 Thunderstorms More clouds than sun
 6 Sprinkles Partly cloudy           
 7 Heavy rain Broken clouds          
 8 Light rain Partly cloudy     

我正在嘗試使用mutate來刪除一些文本。 例如,我希望上面看起來像:

Weather                           
   <chr>                             
 1 Snow                   
 2 Snow                       
 3 Drizzle                      
 4 Thundershowers      
 5 Thunderstorms More clouds than sun
 6 Sprinkles Partly cloudy           
 7 Heavy rain           
 8 Light rain 

所以我想刪除一些特定單詞之后的文本。 如果我有以下向量:

c("Snow", "Drizzle", "Heavy rain", "Light rain") 

刪除這些后面的文字。 但是,我不想grep諸如CloudyFog詞,因為它們在數據中作為自己的行出現,但是諸如Snow Light fog類的東西可以分解為Snow

數據:

d <- structure(list(Weather = c("Snow Low clouds", "Snow Cloudy", 
"Drizzle Fog", "Thundershowers Partly cloudy", "Thunderstorms More clouds than sun", 
"Sprinkles Partly cloudy", "Heavy rain Broken clouds", "Light rain Partly cloudy", 
"Rain showers Passing clouds", "Thundershowers Scattered clouds", 
"Thundershowers Passing clouds", "Light snow Overcast", "Snow Light fog", 
"Drizzle Broken clouds", "Light rain Fog", "Cloudy", "Thunderstorms Partly cloudy", 
"Heavy rain More clouds than sun", "Partly cloudy", NA)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L))

您可以在此處采用的一般方法是構建所有目標術語的正則表達式交替。 然后,匹配這些術語后跟任何內容直到輸入結束,並僅替換為術語。

terms <- c("Snow", "Drizzle", "Heavy rain", "Light rain")
regex <- paste0("\\b(", paste(terms, collapse="|"), ")\\b")
sub(paste0(regex, "\\s.*"), "\\1", d$Weather)

 [1] "Snow"                               "Snow"                              
 [3] "Drizzle"                            "Thundershowers Partly cloudy"      
 [5] "Thunderstorms More clouds than sun" "Sprinkles Partly cloudy"           
 [7] "Heavy rain"                         "Light rain"                        
 [9] "Rain showers Passing clouds"        "Thundershowers Scattered clouds"   
[11] "Thundershowers Passing clouds"      "Light snow Overcast"               
[13] "Snow"                               "Drizzle"                           
[15] "Light rain"                         "Cloudy"                            
[17] "Thunderstorms Partly cloudy"        "Heavy rain"                        
[19] "Partly cloudy"                      NA

請注意,我的輸出與您的預期輸出不完全一致,但是您再次沒有在建議的向量中包含所有目標詞。

我使用的正則表達式是:

\b(Snow|Drizzle|Heavy rain|Light rain)\b

這里的技巧是上述交替也是一個捕獲組,讓我們可以輕松地用您想要的術語替換匹配項。 您可以為此添加更多術語以獲得所需的輸出。

  • 也許你可以使用下面的代碼
v <- c("Snow", "Drizzle", "Heavy rain", "Light rain") 
pat <- paste0(v,collapse = "|")
unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))

以至於

> unlist(regmatches(d$Weather,gregexpr(pat,d$Weather)))
[1] "Snow"       "Snow"       "Drizzle"    "Heavy rain" "Light rain" "Snow"      
[7] "Drizzle"    "Light rain" "Heavy rain"
  • 如果要添加提取的值並將它們附加到新列中的d ,則可以使用以下代碼:
d <- within(d,X <- ifelse(grepl(pat,Weather),unlist(regmatches(Weather,gregexpr(pat,Weather))),NA))

以至於

> d
# A tibble: 20 x 2
   Weather                            X         
   <chr>                              <chr>     
 1 Snow Low clouds                    Snow      
 2 Snow Cloudy                        Snow      
 3 Drizzle Fog                        Drizzle   
 4 Thundershowers Partly cloudy       NA        
 5 Thunderstorms More clouds than sun NA        
 6 Sprinkles Partly cloudy            NA        
 7 Heavy rain Broken clouds           Drizzle   
 8 Light rain Partly cloudy           Light rain
 9 Rain showers Passing clouds        NA        
10 Thundershowers Scattered clouds    NA        
11 Thundershowers Passing clouds      NA        
12 Light snow Overcast                NA        
13 Snow Light fog                     Heavy rain
14 Drizzle Broken clouds              Light rain
15 Light rain Fog                     Snow      
16 Cloudy                             NA        
17 Thunderstorms Partly cloudy        NA        
18 Heavy rain More clouds than sun    Heavy rain
19 Partly cloudy                      NA        
20 NA                                 NA  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM