[英]Removing everything from a string column conditional on another string column in R
I try to clean up a column containing long speeches during a debate.我试图清理一个包含辩论期间长篇演讲的专栏。 Right now, every row starts with a new speaker, however, things like subheaders remain at the end of each speech, which is not desirable.现在,每一行都以一个新的演讲者开始,但是,副标题之类的东西保留在每个演讲的结尾,这是不可取的。
Here is some example data:以下是一些示例数据:
speeches <- tibble(subheader = c("3.Discussion", "8.Voting"),
full_speech = c("I close this part. 3.Discussion Let's start with",
"I think we can vote now")
)
Desired Outcome:期望的结果:
subheader full_speech
3. Discussion I close this part.
8. Voting I think we can vote now
What I tried so far:到目前为止我尝试了什么:
speeches %>%
mutate(full_speech = str_remove(full_speech, subheader))
But of course this only deletes the subheaders and not what follows after them.但是当然这只会删除子标题而不是它们之后的内容。
We can paste the subheader
with .*
to match any characters that succeeds the subheader我们可以用.*
粘贴subheader
以匹配子标题后的任何字符
library(dplyr)
library(stringr)
speeches %>%
mutate(full_speech = str_remove(full_speech, str_c("\\s+",
subheader, ".*")))
-output -输出
# A tibble: 2 × 2
subheader full_speech
<chr> <chr>
1 3.Discussion I close this part.
2 8.Voting I think we can vote now
An approach using sub
and paste
to construct the replacements from subheader .一种使用sub
和paste
从subheader构造替换的方法。
library(dplyr)
speeches %>%
rowwise() %>%
mutate(full_speech = gsub(
paste0(" ", subheader, ".*", collapse=""), "", full_speech)) %>%
ungroup()
# A tibble: 2 × 2
subheader full_speech
<chr> <chr>
1 3.Discussion I close this part.
2 8.Voting I think we can vote now
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.