简体   繁体   English

正则表达式将主题标签拆分为单词

[英]Regex to split a hashtag into words

I am trying to come up with a regex which will correctly split a hashtag into its words. 我正在尝试提出一个正则表达式,它将正确地将主题标签拆分成单词。 For example: 例如:

XP => XP XP => XP
ACar => A Car ACar =>一辆车
GoodCar => Good Car 好车=>好车
OnceUponATime => Once Upon A Time OnceUponATime =>从前
LoveXP => Love XP LoveXP =>爱XP
AppleVsXP => Apple Vs XP AppleVsXP =>苹果Vs XP
JamesBond007 => James Bond 007 JamesBond007 =>詹姆斯·邦德007

Edit: I have tried 编辑:我已经尝试

expanded = ' '.join(re.findall(r"[A-Z][^A-Z]*", self.text))

What is aa more robust way which will address all the use cases above? 有什么更健壮的方法可以解决上述所有用例?

you can simply do this by this expression, it's totally sufficient: 您可以通过以下表达式简单地做到这一点,这已经足够了:

expanded = " ".join([a for a in re.split('([A-Z][a-z]+)', i) if a])

it gives the following results: 它给出以下结果:

XP
A Car
Good Car
Once Upon A Time
Love XP
Apple Vs XP
James Bond 007

Hope this was helpful. 希望这会有所帮助。

You can define several patterns to match what's considered a separate word - an uppercase character followed by a series of lowercase characters, a series of digits, a series of uppercase characters not followed by lowercase characters, etc. - and then just cycle it over your string: 您可以定义几种模式来匹配一个单独的单词-一个大写字符,然后是一系列小写字符,一系列数字,一系列大写字符,然后是小写字符,等等-然后将其循环串:

import re

pattern = re.compile(r"[A-Z][a-z]+|\d+|[A-Z]+(?![a-z])")

def split_hashtag(tag):
    return pattern.findall(tag)

If you test it with your tags: 如果您使用标签进行测试:

test_tags = ["XP", "ACar", "GoodCar", "OnceUponATime", "LoveXP", "AppleVsXP", "JamesBond007"]
for tag in test_tags:
    print("{} => {}".format(tag, " ".join(split_hashtag(tag))))

You get: 你得到:

XP => XP
ACar => A Car
GoodCar => Good Car
OnceUponATime => Once Upon A Time
LoveXP => Love XP
AppleVsXP => Apple Vs XP
JamesBond007 => James Bond 007

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM