[英]Regex matching multiple delimiters
我正在尝试分割以下定界符:句号,分号,*,+ 、? 和-但是,我只想在句子开头出现时对“-”进行拆分(以免拆分“非功能性”之类的词
我尝试了以下操作,但没有任何进展,我们将不胜感激:
sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)
这是我一直在尝试的示例文本:
- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon
* See this case mis-alignment
拆分后的预期输出是项目列表:
TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment
尝试像这样枚举定界符:
re.split(“ [。; * +?]”)
如果您想将字符串拆分为一组定义的定界符,而不是这样做:
>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']
如果您不想进入结果列表,请使用以下分隔符:
>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']
编辑:为响应您的以下评论,请使用\\s
表示空格:
>>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon'''
>>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt)
>>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n stability', 'New icon']
您可以使用此re.split
函数。
>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.