简体   繁体   English

Python使用正则表达式分割字符串

[英]Python split a string using regex

I would like to split a string by ':' and ' ' characters. 我想用':'和''字符分割一个字符串。 However, i would like to ignore two spaces ' ' and two colons '::'. 但是,我想忽略两个空格''和两个冒号'::'。 for eg 例如

text = "s:11011 i:11010 ::110011  :110010 d:11000"

should split into 应分成

[s,11011,i,11010,:,110011, ,110010,d,11000]

after following the Regular Expressions HOWTO on the python website, i managed to comeup with the following 在python网站上关注正则表达式HOWTO后,我设法得到以下内容

regx= re.compile('([\s:]|[^\s\s]|[^::])')
regx.split(text)

However this does not work as intended as it splits on the : and spaces, but it still includes the ':' and ' ' in the split. 但是,这不能按预期工作,因为它在:和空格上分割,但它仍然包括分割中的':'和''。

[s,:,11011, ,i,:,11010, ,:,:,110011, , :,110010, ,d,:,11000]

How can I fix this? 我怎样才能解决这个问题?

EDIT: In case of a double space, i only want one space to appear 编辑:如果是双倍空格,我只想要一个空格出现

Note this assumes that your data has format like X:101010 : 请注意,这假设您的数据格式为X:101010

>>> re.findall(r'(.+?):(.+?)\b ?',text)
[('s', '11011'), ('i', '11010'), (':', '110011'), (' ', '110010'), ('d', '11000')]

Then chain them up: 然后chain起来:

>>> list(itertools.chain(*_))
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']
>>> text = "s:11011 i:11010 ::110011  :110010 d:11000"
>>> [x for x in re.split(r":(:)?|\s(\s)?", text) if x]
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']

Use the regex (?<=\\d) |:(?=\\d) to split: 使用正则表达式(?<=\\d) |:(?=\\d)进行拆分:

>>> text = "s:11011 i:11010 ::110011  :110010 d:11000"
>>> result = re.split(r"(?<=\d) |:(?=\d)", text)
>>> result
['s', '11011', 'i', '11010', ':', '110011', ' ', '110010', 'd', '11000']

This will split on: 这将拆分为:

(?<=\\d) a space, when there is a digit on the left. (?<=\\d)一个空格,当左边有一个数字时。 To check this I use a lookbehind assertion . 为了检查这一点,我使用了一个lookbehind断言

:(?=\\d) a colon, when there is a digit on the right. :(?=\\d)冒号,右边有一个数字。 To check this I use a lookahead assertion . 为了检查这一点,我使用了一个先行断言

Have a look at this pattern: 看看这个模式:

([a-z\:\s])\:(\d+)

It will give you the same array you are expecting. 它会为您提供您期望的相同阵列。 No need to use split, just access the matches you have returned by the regex engine. 无需使用拆分,只需访问正则表达式引擎返回的匹配项。

Hope it helps! 希望能帮助到你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM