从R中的字符串中分离出最后一个句子

Question

我有一个字符串向量，我想将最后一个句子与R中的每个字符串分开。

句子可能以句号（。）甚至感叹号（！）结尾。 因此，我对如何从R中的字符串中分离最后一个句子感到困惑。

Answer 1

您可以使用strsplit从每个字符串中获取最后一个句子，如下所示：

## paragraph <- "Your vector here"
result <- strsplit(paragraph, "\\.|\\!|\\?")

last.sentences <- sapply(result, function(x) {
    trimws((x[length(x)]))
})

Answer 2

只要您的输入足够干净（尤其是句子之间有空格），则可以使用：

sub(".*(\\.|\\?|\\!) ", "", trimws(yourvector))

它找到以标点符号和空格结尾的最长子字符串，然后将其删除。

我添加了trimws ，以防某些字符串中有尾随空格。

例：

u <- c("This is a sentence. And another sentence!",
       "By default R regexes are greedy. So only the last sentence is kept. You see ? ",
       "Single sentences are not a problem.",
       "What if there are no spaces between sentences?It won't work.",
       "You know what? Multiple marks don't break my solution!!",
       "But if they are separated by spaces, they do ! ! !")

sub(".*(\\.|\\?|\\!) ", "", trimws(u))
# [1] "And another sentence!"                                       
# [2] "You see ?"                                                   
# [3] "Single sentences are not a problem."                         
# [4] "What if there are no spaces between sentences?It won't work."
# [5] "Multiple marks don't break my solution!!"                    
# [6] "!"

Answer 3

此正则表达式使用$锚定到字符串的末尾，允许使用可选的“。”。 要么 '！' 在末尾。 在最前面，它找到最接近的“。”或“！”作为前一句的结尾。 负回溯？<=确保“。” 要么 '！' 不匹配。 还以^开头提供单个句子。

s <- "Sentences may end with full stops(.) or even exclamatory marks(!). Hence i am confused as to how to separate the last sentence from a string in R."
library (stringr)
str_extract(s, "(?<=(\\.\\s|\\!\\s|^)).+(\\.|\\!)?$")

产量

# [1] "Hence i am confused as to how to separate the last sentence from a string in R."

从R中的字符串中分离出最后一个句子

问题描述

3 个解决方案

解决方案1
2 2017-04-10 23:25:04

解决方案2
1 2017-04-11 08:19:26

解决方案3
0 2017-04-11 01:07:25

从R中的字符串中分离出最后一个句子

问题描述

3 个解决方案

解决方案1 2 2017-04-10 23:25:04

解决方案2 1 2017-04-11 08:19:26

解决方案3 0 2017-04-11 01:07:25

解决方案1
2 2017-04-10 23:25:04

解决方案2
1 2017-04-11 08:19:26

解决方案3
0 2017-04-11 01:07:25