[英]Removing everything from a string column conditional on another string column in R
我試圖清理一個包含辯論期間長篇演講的專欄。 現在,每一行都以一個新的演講者開始,但是,副標題之類的東西保留在每個演講的結尾,這是不可取的。
以下是一些示例數據:
speeches <- tibble(subheader = c("3.Discussion", "8.Voting"),
full_speech = c("I close this part. 3.Discussion Let's start with",
"I think we can vote now")
)
期望的結果:
subheader full_speech
3. Discussion I close this part.
8. Voting I think we can vote now
到目前為止我嘗試了什么:
speeches %>%
mutate(full_speech = str_remove(full_speech, subheader))
但是當然這只會刪除子標題而不是它們之后的內容。
我們可以用.*
粘貼subheader
以匹配子標題后的任何字符
library(dplyr)
library(stringr)
speeches %>%
mutate(full_speech = str_remove(full_speech, str_c("\\s+",
subheader, ".*")))
-輸出
# A tibble: 2 × 2
subheader full_speech
<chr> <chr>
1 3.Discussion I close this part.
2 8.Voting I think we can vote now
一種使用sub
和paste
從subheader構造替換的方法。
library(dplyr)
speeches %>%
rowwise() %>%
mutate(full_speech = gsub(
paste0(" ", subheader, ".*", collapse=""), "", full_speech)) %>%
ungroup()
# A tibble: 2 × 2
subheader full_speech
<chr> <chr>
1 3.Discussion I close this part.
2 8.Voting I think we can vote now
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.