简体   繁体   English

如何在 Python 中使用正则表达式获取字符串中的所有子字符串

[英]How to get all the substrings in string using Regex in Python

I have a string such as: "12345"我有一个字符串,例如: "12345"

using the regex, how to get all of its substrings that consist of one up to three consecutive characters to get an output such as:使用正则表达式,如何获取由最多三个连续字符组成的所有子字符串以获取输出,例如:

'1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345'

You can use re.findall with a positive lookahead pattern that matches a character repeated for a number of times that's iterated from 1 to 3:您可以将re.findall与正前瞻模式一起使用,该模式匹配从 1 到 3 重复多次的字符:

[match for size in range(1, 4) for match in re.findall('(?=(.{%d}))' % size, s)]

However, it would be more efficient to use a list comprehension with nested for clauses to iterate through all the sizes and starting indices:但是,使用带有嵌套for子句的列表推导式迭代所有大小和起始索引会更有效:

[s[start:start + size] for size in range(1, 4) for start in range(len(s) - size + 1)]

Given s = '12345' , both of the above would return:鉴于s = '12345' ,以上两个都会返回:

['1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM