简体   繁体   English

Python-提取子字符串列表

[英]Python - extracting a list of sub strings

How to extract a list of sub strings based on some patterns in python? 如何基于python中的某些模式提取子字符串列表?

for example, 例如,

str = 'this {{is}} a sample {{text}}'.

expected result : a python list which contains 'is' and 'text' 预期结果:包含“ is”和“ text”的python列表

>>> import re
>>> re.findall("{{(.*?)}}", "this {{is}} a sample {{text}}")
['is', 'text']

You can use the following: 您可以使用以下内容:

res = re.findall("{{([^{}]*)}}", a)
print "a python list which contains %s and %s" % (res[0], res[1])

Cheers 干杯

Assuming "some patterns" means "single words between double {}'s": 假设“某些模式”表示“双{}之间的单个单词”:

import re 汇入

re.findall('{{(\\w*)}}', string) re.findall('{{(\\ w *)}}',字符串)

Edit: Andrew Clark's answer implements "any sequence of characters at all between double {}'s" 编辑:安德鲁·克拉克(Andrew Clark)的答案实现了“双{}之间的所有字符序列”

A regex-based solution is fine for your example, although I would recommend something more robust for more complicated input. 基于正则表达式的解决方案适合您的示例,尽管我会为更复杂的输入建议更健壮的方法。

import re

def match_substrings(s):
    return re.findall(r"{{([^}]*)}}", s)

The regex from inside-out: 由内而外的正则表达式:

[^}] matches anything that's not a '}' [^}]匹配非'}'的任何内容
([^}]*) matches any number of non-} characters and groups them ([^}]*)匹配任意数量的非}字符并将它们分组
{{([^}]*)}} puts the above inside double-braces {{([^}]*)}}将以上内容放在双括号内

Without the parentheses above, re.findall would return the entire match (ie ['{{is}}', '{{text}}'] . However, when the regex contains a group, findall will use that, instead. 如果没有上述括号, re.findall将返回整个匹配项(即['{{is}}', '{{text}}'] 。),但是,当正则表达式包含一个组时,findall将使用该组。

You could use a regular expression to match anything that occurs between {{ and }} . 您可以使用正则表达式来匹配{{}}之间发生的任何事情。 Will that work for you? 那对你有用吗?

Generally speaking, for tagging certain strings in a large body of text, a suffix tree will be useful. 一般来说,对于标记大量文本中的某些字符串, 后缀树将很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM