简体   繁体   English

在python中匹配以特定字符开头的行块

[英]Match a block of lines starting with a specific charecter in python

I would like to search through a block of text and match the lines starting with a specific character in python- but want to stop as soon as that rule is broken.我想搜索一段文本并匹配 python 中以特定字符开头的行 - 但一旦该规则被破坏,我想立即停止。

For example, in the following text: (starting with asterisks)例如,在以下文本中:(以星号开头)


* point one * 第一点

* point two * 第二点

** point two.one **点二.一

* last point three * 最后一点三

But here is a text in between但这是中间的一段文字

* four * 四


I would like to stop the search as soon as encountering the non-bulleted text.我想在遇到非项目符号文本时立即停止搜索。 Ie the search/find should return only the text up to "* last point three".即搜索/查找应仅返回“* 最后一点三”之前的文本。

I have been trying with various regex but with no luck.我一直在尝试各种正则表达式,但没有运气。 The closest I have got so far is到目前为止我最接近的是

r'(^[*(**)].*)' r'(^[*(**)].*)'

Any help will be appreciated.任何帮助将不胜感激。

Thanks谢谢

tjr tjr

You can use the following regex to get those blocks:您可以使用以下正则表达式来获取这些块:

^(?:\*+[^*\n]*?\n*(?=\*))*\*+[^*\n]*?(?:\n|$)(?!\*)

See demo演示

You mean this,你是这个意思,

re.findall(r'^(?s)\\*[^\n]*(?:\n\\\*[^\n]*)*', s)

DEMO演示

If the goal is simply to match up to the first point in the stream where the condition is not true, it seems like the most concise way to express this is如果目标只是匹配流中条件不为真的第一个点,那么表达这一点的最简洁方法似乎是

>>> pattern = r'^(?s)\*[^\n]*(?:\n+\*[^\n]*)*'

>>> target = """* point one
...
... * point two
...
... ** point two.one
...
... * last point three
...
... But here is a text in between
...
... * four
... """
>>> m=re.search(pattern,target)
>>> m.group(0)
'* point one\n\n* point two\n\n** point two.one\n\n* last point three'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM