拆分在python中的大写字母组

Question

I'm trying to tokenize a number of strings using a capital letter as a delimited. 我正在尝试使用大写字母作为分隔符号来标记许多字符串。 I have landed on the following code: 我已经登陆以下代码：

token = ([a for a in re.split(r'([A-Z][a-z]*)', "ABCowDog") if a])

print token

And I get this, as expected, in return: 正如预期的那样，我得到了这个回报：

['A', 'B', 'Cow', 'Dog'] ['A'，'B'，'牛'，'狗']

Now, this is just an example string to make life easier, but in my case I want to go through this list and find individual characters (easy enough with checking len()) and putting the individual letters together, provided they meet a prior definition. 现在，这只是一个让生活更轻松的示例字符串，但在我的情况下，我想通过此列表查找单个字符（检查len（）很容易并将各个字母放在一起，前提是它们符合先前的定义。 In the example above the strings 'AB', 'Cow', and 'Dog' are the strings I actually want to form (consecutive capitals are part of an acronym). 在上面的例子中，字符串'AB'，'Cow'和'Dog'是我实际想要形成的字符串（连续大写是首字母缩略词的一部分）。 For whatever reason, once I have my token, I am unable to figure out how to walk the list. 无论出于何种原因，一旦我获得了令牌，我就无法弄清楚如何走到列表中。 Sorry if this is a simple answer, but I'm fairly new to python and am sick of banging my head against the wall. 对不起，如果这是一个简单的答案，但我对python很新，并且厌倦了撞到墙上。

Answer 1

re.split isn't always easy to use and seems sometimes limited in many situations. re.split并不总是易于使用，在许多情况下有时似乎有限。 You can try a different approach with re.findall : 您可以尝试使用re.findall的不同方法：

>>> s = 'ABCowDog'
>>> re.findall(r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)', s)
['AB', 'Cow', 'Dog']

Answer 2

You can use the following to split with regex module : 您可以使用以下内容与regex模块分开：

(?=[A-Z][a-z])

See DEMO 见DEMO

Code: 码：

regex.split(r'(?=[A-Z][a-z])', "ABCowDog",flags=regex.VERSION1)

Answer 3

([A-Z][a-z]+)

你应该这样拆分。

拆分在python中的大写字母组

问题描述

3 个解决方案

解决方案1
3 2015-06-02 14:37:53

解决方案2
1 2015-06-02 14:17:43

解决方案3
0 2015-06-02 14:15:53

拆分在python中的大写字母组

问题描述

3 个解决方案

解决方案1 3 2015-06-02 14:37:53

解决方案2 1 2015-06-02 14:17:43

解决方案3 0 2015-06-02 14:15:53

解决方案1
3 2015-06-02 14:37:53

解决方案2
1 2015-06-02 14:17:43

解决方案3
0 2015-06-02 14:15:53