使用 Python 正则表达式中的排列捕获重复子模式

Question

I am trying to tokenize a string made of sub-patterns that can appear in any order.我试图标记一个由可以以任何顺序出现的子模式组成的字符串。 The sub-patterns are underscore, letters or numbers.子模式是下划线、字母或数字。 For example:例如：

   'ABC_123_DEF_456' would provide ('ABC', '_', '123', '_', 'DEF', '_', '456')

Here is the implemented regex giving the unexpected result:这是实现的正则表达式给出了意想不到的结果：

>>> m = regex.match(r'^((_)|(\d+)|([[:alpha:]]+))+$', 'ABC_123_DEF_456')
>>> m.groups()
('456', '_', '456', 'DEF')

Updates: - permutations: the three sub-patterns can appear in any order for example:更新： - 排列：三个子模式可以以任何顺序出现，例如：

'ABC123__' would provide ('ABC', '123', '_', '_')

Answer 1

You can use /([az]+|\\d+|_)/i to chunk the string into groups of digits, alphabetical characters or single underscores:您可以使用/([az]+|\\d+|_)/i将字符串分成数字组、字母组或单个下划线：

>>> re.findall(r"([a-z]+|\d+|_)", "ABC_123_DEF_456", re.I)
['ABC', '_', '123', '_', 'DEF', '_', '456']
>>> re.findall(r"([a-z]+|\d+|_)", "ABC123__", re.I)
['ABC', '123', '_', '_']

使用 Python 正则表达式中的排列捕获重复子模式

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-01-10 17:54:12

使用 Python 正则表达式中的排列捕获重复子模式

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-01-10 17:54:12

解决方案1
3 已采纳 2020-01-10 17:54:12