python正则表达式：如何根据字母，数字和标点符号将字符串拆分为不同的组

Question

I am learning regular expressions using python 2.7 我正在使用python 2.7学习正则表达式

Given a sentence(assume lowercase and ascii) such as: 给出一个句子（假设小写和ascii），例如：

input = 'i like: a, b, 007 and c!!'

How would I tokenize the input string into 我如何将输入字符串标记为

['i', 'like', ':', 'a', ',', 'b', ',', '007', 'and', 'c', '!!']

I can write the automata and code the transition matrix in C++, but I would like to do this in python 我可以编写自动机并用C ++编写转换矩阵代码，但我想在python中执行此操作

I am unable to come up with a regex that will match these distinct classes of alphabets, digits and punctuations in one go. 我无法想出一个可以同时匹配这些不同类别的字母，数字和标点符号的正则表达式。

I have seen a couple of stackoverflow posts here and here , but do not quite follow their approaches. 我在这里和这里看过几个stackoverflow帖子，但是并没有完全按照他们的方法。

I have tried this for some time now and I would appreciate your help on this. 我已经尝试了一段时间，我很感激你的帮助。

PS: This is not a homework question PS：这不是一个家庭作业问题

Answer 1

>>> from string import punctuation
>>> text = 'i like: a, b, 007 and c!!'
>>> re.findall('\w+|[{0}]+'.format(punctuation),text)
['i', 'like', ':', 'a', ',', 'b', ',', '007', 'and', 'c', '!!']

This also works but finds any non-whitespace character if it doesn't find alphanumeric characters 这也有效，但如果找不到字母数字字符，则会找到任何非空白字符

>>> re.findall('\w+|\S+',text)
['i', 'like', ':', 'a', ',', 'b', ',', '007', 'and', 'c', '!!']

python正则表达式：如何根据字母，数字和标点符号将字符串拆分为不同的组

问题描述

1 个解决方案

解决方案1
3 已采纳 2012-04-21 15:33:32

python正则表达式：如何根据字母，数字和标点符号将字符串拆分为不同的组

问题描述

1 个解决方案

解决方案1 3 已采纳 2012-04-21 15:33:32

解决方案1
3 已采纳 2012-04-21 15:33:32