[英]Python - extracting a list of sub strings
How to extract a list of sub strings based on some patterns in python? 如何基于python中的某些模式提取子字符串列表?
for example, 例如,
str = 'this {{is}} a sample {{text}}'.
expected result : a python list which contains 'is' and 'text' 预期结果:包含“ is”和“ text”的python列表
>>> import re
>>> re.findall("{{(.*?)}}", "this {{is}} a sample {{text}}")
['is', 'text']
You can use the following: 您可以使用以下内容:
res = re.findall("{{([^{}]*)}}", a)
print "a python list which contains %s and %s" % (res[0], res[1])
Cheers 干杯
Assuming "some patterns" means "single words between double {}'s": 假设“某些模式”表示“双{}之间的单个单词”:
import re 汇入
re.findall('{{(\\w*)}}', string) re.findall('{{(\\ w *)}}',字符串)
Edit: Andrew Clark's answer implements "any sequence of characters at all between double {}'s" 编辑:安德鲁·克拉克(Andrew Clark)的答案实现了“双{}之间的所有字符序列”
A regex-based solution is fine for your example, although I would recommend something more robust for more complicated input. 基于正则表达式的解决方案适合您的示例,尽管我会为更复杂的输入建议更健壮的方法。
import re
def match_substrings(s):
return re.findall(r"{{([^}]*)}}", s)
The regex from inside-out: 由内而外的正则表达式:
[^}]
matches anything that's not a '}' [^}]
匹配非'}'的任何内容
([^}]*)
matches any number of non-} characters and groups them ([^}]*)
匹配任意数量的非}字符并将它们分组
{{([^}]*)}}
puts the above inside double-braces {{([^}]*)}}
将以上内容放在双括号内
Without the parentheses above, re.findall
would return the entire match (ie ['{{is}}', '{{text}}']
. However, when the regex contains a group, findall will use that, instead. 如果没有上述括号,
re.findall
将返回整个匹配项(即['{{is}}', '{{text}}']
。),但是,当正则表达式包含一个组时,findall将使用该组。
You could use a regular expression to match anything that occurs between {{
and }}
. 您可以使用正则表达式来匹配
{{
和}}
之间发生的任何事情。 Will that work for you? 那对你有用吗?
Generally speaking, for tagging certain strings in a large body of text, a suffix tree will be useful. 一般来说,对于标记大量文本中的某些字符串, 后缀树将很有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.