[英]python substitute words between two points in a text
In the last few days I am dealing with regular expressions. 在过去的几天中,我正在处理正则表达式。 So, let's say that I have a text 所以,假设我有一段文字
text = '
1. sometext sometext sometext given as follows:
«book one
title here
part one
1. mpla mpla mpla
2. some text some text «here spesific text»
book two
1. some text some text.
2. «also» try this in case of emergency.»
book three
part three
directions to home'
and I am trying to find all books between '«' and '»'. 并且我正在尝试查找“«”和“»”之间的所有图书。 change it with the word 'chapter' and get the text back. 用“章节”一词进行更改,然后将其取回。 By using regular expression I can't get the result that I want because as far as I can understand regex isn't the best solution for counting how many '»' we have passed so far. 通过使用正则表达式,我无法获得想要的结果,因为据我所知,正则表达式并不是计算到目前为止已传递的“»”的最佳解决方案。
For example If I use 例如,如果我使用
print re.findall(r'«([book\s\S+]*?)»', data, re.DOTALL)
I only get the text until the first '»'. 我只会收到第一个“»”之前的文本。 Is there a way to get book one and book two? 有没有办法获得第一本书和第二本书?
I also tried this: 我也试过这个:
print re.findall(r'(?<=«)(?=(book\s\S+))|(?=[^«]*»)(?=(book\s\S+))',data, re.DOTALL)
but neither works. 但都行不通。 Is there a way to get the result or should I use other than regular expressions? 有没有一种获取结果的方法,或者我应该使用除正则表达式之外的其他方法?
One solution is to do this in two parts as follows: 一种解决方案是分两部分进行此操作,如下所示:
print re.findall(r"(book\s\S+)", re.search("«(.*)»", text, re.S).group(1), re.S)
This first finds the outer « »
and then searches inside this for the books. 这首先找到外部« »
,然后在内部搜索书籍。
This gives the following output: 这给出以下输出:
['book one', 'book two']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.