繁体   English   中英

正则表达式匹配多个定界符

[英]Regex matching multiple delimiters

我正在尝试分割以下定界符:句号,分号,*,+ 、? 和-但是,我只想在句子开头出现时对“-”进行拆分(以免拆分“非功能性”之类的词

我尝试了以下操作,但没有任何进展,我们将不胜感激:

sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)

这是我一直在尝试的示例文本:

- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon                                                                          
* See this case mis-alignment

拆分后的预期输出是项目列表:

TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment

尝试像这样枚举定界符:

re.split(“ [。; * +?]”)

如果您想将字符串拆分为一组定义的定界符,而不是这样做:

>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']

如果您不想进入结果列表,请使用以下分隔符:

>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']

编辑:为响应您的以下评论,请使用\\s表示空格:

    >>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
    * Updated Dropbox support 
    * Improved
    stability
    - New icon'''
     >>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt) 
     >>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n    stability', 'New icon']

您可以使用此re.split函数。

>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM