正则表达式将主题标签拆分为单词

Question

I am trying to come up with a regex which will correctly split a hashtag into its words. 我正在尝试提出一个正则表达式，它将正确地将主题标签拆分成单词。 For example: 例如：

XP => XP XP => XP
ACar => A Car ACar =>一辆车
GoodCar => Good Car 好车=>好车
OnceUponATime => Once Upon A Time OnceUponATime =>从前
LoveXP => Love XP LoveXP =>爱XP
AppleVsXP => Apple Vs XP AppleVsXP =>苹果Vs XP
JamesBond007 => James Bond 007 JamesBond007 =>詹姆斯·邦德007

Edit: I have tried 编辑：我已经尝试

expanded = ' '.join(re.findall(r"[A-Z][^A-Z]*", self.text))

What is aa more robust way which will address all the use cases above? 有什么更健壮的方法可以解决上述所有用例？

Answer 1

you can simply do this by this expression, it's totally sufficient: 您可以通过以下表达式简单地做到这一点，这已经足够了：

expanded = " ".join([a for a in re.split('([A-Z][a-z]+)', i) if a])

it gives the following results: 它给出以下结果：

XP
A Car
Good Car
Once Upon A Time
Love XP
Apple Vs XP
James Bond 007

Hope this was helpful. 希望这会有所帮助。

Answer 2

You can define several patterns to match what's considered a separate word - an uppercase character followed by a series of lowercase characters, a series of digits, a series of uppercase characters not followed by lowercase characters, etc. - and then just cycle it over your string: 您可以定义几种模式来匹配一个单独的单词-一个大写字符，然后是一系列小写字符，一系列数字，一系列大写字符，然后是小写字符，等等-然后将其循环串：

import re

pattern = re.compile(r"[A-Z][a-z]+|\d+|[A-Z]+(?![a-z])")

def split_hashtag(tag):
    return pattern.findall(tag)

If you test it with your tags: 如果您使用标签进行测试：

test_tags = ["XP", "ACar", "GoodCar", "OnceUponATime", "LoveXP", "AppleVsXP", "JamesBond007"]
for tag in test_tags:
    print("{} => {}".format(tag, " ".join(split_hashtag(tag))))

You get: 你得到：

XP => XP
ACar => A Car
GoodCar => Good Car
OnceUponATime => Once Upon A Time
LoveXP => Love XP
AppleVsXP => Apple Vs XP
JamesBond007 => James Bond 007

正则表达式将主题标签拆分为单词

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-07-08 15:33:27

解决方案2
1 2017-07-08 15:20:42

正则表达式将主题标签拆分为单词

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-07-08 15:33:27

解决方案2 1 2017-07-08 15:20:42

解决方案1
4 已采纳 2017-07-08 15:33:27

解决方案2
1 2017-07-08 15:20:42