繁体   English   中英

从R中的字符串中分离出最后一个句子

[英]separating last sentence from a string in R

我有一个字符串向量,我想将最后一个句子与R中的每个字符串分开。

句子可能以句号(。)甚至感叹号(!)结尾。 因此,我对如何从R中的字符串中分离最后一个句子感到困惑。

您可以使用strsplit从每个字符串中获取最后一个句子,如下所示:

## paragraph <- "Your vector here"
result <- strsplit(paragraph, "\\.|\\!|\\?")

last.sentences <- sapply(result, function(x) {
    trimws((x[length(x)]))
})

只要您的输入足够干净(尤其是句子之间有空格),则可以使用:

sub(".*(\\.|\\?|\\!) ", "", trimws(yourvector))

它找到以标点符号和空格结尾的最长子字符串,然后将其删除。

我添加了trimws ,以防某些字符串中有尾随空格。

例:

u <- c("This is a sentence. And another sentence!",
       "By default R regexes are greedy. So only the last sentence is kept. You see ? ",
       "Single sentences are not a problem.",
       "What if there are no spaces between sentences?It won't work.",
       "You know what? Multiple marks don't break my solution!!",
       "But if they are separated by spaces, they do ! ! !")

sub(".*(\\.|\\?|\\!) ", "", trimws(u))
# [1] "And another sentence!"                                       
# [2] "You see ?"                                                   
# [3] "Single sentences are not a problem."                         
# [4] "What if there are no spaces between sentences?It won't work."
# [5] "Multiple marks don't break my solution!!"                    
# [6] "!"  

此正则表达式使用$锚定到字符串的末尾,允许使用可选的“。”。 要么 '!' 在末尾。 在最前面,它找到最接近的“。”或“!”作为前一句的结尾。 负回溯?<=确保“。” 要么 '!' 不匹配。 还以^开头提供单个句子。

s <- "Sentences may end with full stops(.) or even exclamatory marks(!). Hence i am confused as to how to separate the last sentence from a string in R."
library (stringr)
str_extract(s, "(?<=(\\.\\s|\\!\\s|^)).+(\\.|\\!)?$")

产量

# [1] "Hence i am confused as to how to separate the last sentence from a string in R."

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM