简体   繁体   English

Python:在两个字符串之间找到一个字符串,重复

[英]Python: Find a string between two strings, repeatedly

I'm new to Python and still learning about regular expressions, so this question may sound trivial to some regex expert, but here you go.我是 Python 新手,仍在学习正则表达式,所以这个问题对一些正则表达式专家来说可能听起来微不足道,但你去吧。 I suppose my question is a generalization of this question about finding a string between two strings .我想我的问题是关于在两个字符串之间查找字符串的问题的概括。 I wonder: what if this pattern (initial_substring + substring_to_find + end_substring) is repeated many times in a long string?我想知道:如果这个模式(initial_substring + substring_to_find + end_substring)在一个长字符串中重复多次怎么办? For example例如

test='someth1 var="this" someth2 var="that" '
result= re.search('var=(.*) ', test)
print result.group(1)
>>> "this" someth2 var="that"

Instead, I'd like to get a list like ["this","that"] .相反,我想得到一个像["this","that"] How can I do it?我该怎么做?

Use re.findall() :使用re.findall()

result = re.findall(r'var="(.*?)"', test)
print(result)  # ['this', 'that']

If the test string contains multiple lines, use the re.DOTALL flag.如果test字符串包含多行,请使用re.DOTALL标志。

re.findall(r'var="(.*?)"', test, re.DOTALL)

The problem with your current regex is that the capture group (.*) is an extremely greedy statement.您当前regex的问题在于捕获组(.*)是一个非常贪婪的语句。 After the first instance of a var= in your string, that capture group will get everything after it.在字符串中var=的第一个实例之后,该捕获组将获得它之后的所有内容。

If you instead decrease the generalization of the expression to var="(\\w+)" , you will not have the same issue, therefore changing that line of python to:如果您改为将表达式的泛化减少到var="(\\w+)" ,您将不会遇到相同的问题,因此将那行python更改为:

result = re.findall(r'var="([\w\s]+)"', test)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM