按換行符和大寫字母的正則表達式拆分

Question

我一直在努力通過 Python 中的正則表達式來分割我的字符串。

我有一個我加載的文本文件，格式為：

"Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch 
 at Kate's house. Kyle went home at 9. \nSome other sentence 
 here\n\u2022Here's a bulleted line"

我想得到以下輸出：

['Peter went to the gym; he worked out for two hours','Kyle ate lunch 
at Kate's house. He went home at 9.', 'Some other sentence here', 
'\u2022Here's a bulleted line']

我希望在 Python 中用一個新行和一個大寫字母或一個項目符號來分割我的字符串。

我已經嘗試解決問題的前半部分，只用一個新行和大寫字母來分割我的字符串。

這是我到目前為止所擁有的：

print re.findall(r'\n[A-Z][a-z]+',str,re.M)

這只是給我：

[u'\nKyle', u'\nSome']

這只是第一個詞。 我已經嘗試過該正則表達式的變體，但我不知道如何獲得該行的其余部分。

我假設也要按項目符號分割，我將只包含一個 OR 正則表達式，該表達式與按大寫字母分割的正則表達式格式相同。 這是最好的方法嗎？

我希望這是有道理的，如果我的問題不清楚，我很抱歉。 :)

Answer 1

您可以在\\n處以大寫字母或項目符號字符進行拆分：

import re
s = """
Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch 
at Kate's house. Kyle went home at 9. \nSome other sentence 
here\n\u2022Here's a bulleted line
"""
new_list = filter(None, re.split('\n(?=•)|\n(?=[A-Z])', s))

輸出：

['Peter went to the gym; \nhe worked out for two hours ', "Kyle ate lunch \nat Kate's house. Kyle went home at 9. ", 'Some other sentence \nhere', "•Here's a bulleted line\n"]

或者，不使用項目符號字符的符號：

new_list = filter(None, re.split('\n(?=\u2022)|\n(?=[A-Z])', s))

Answer 2

您可以使用此split功能：

>>> str = u"Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch at Kate's house. Kyle went home at 9. \nSome other sentence here\n\u2022Here's a bulleted line"
>>> print re.split(u'\n(?=\u2022|[A-Z])', str)

[u'Peter went to the gym; \nhe worked out for two hours ',
 u"Kyle ate lunch at Kate's house. Kyle went home at 9. ",
 u'Some other sentence here',
 u"\u2022Here's a bulleted line"]

代碼演示

按換行符和大寫字母的正則表達式拆分

問題描述

2 個解決方案

解決方案1
1 2018-02-18 15:23:12

解決方案2
1 已采納 2018-02-18 16:16:06

按換行符和大寫字母的正則表達式拆分

問題描述

2 個解決方案

解決方案1 1 2018-02-18 15:23:12

解決方案2 1 已采納 2018-02-18 16:16:06

解決方案1
1 2018-02-18 15:23:12

解決方案2
1 已采納 2018-02-18 16:16:06