简体   繁体   中英

How to split a string without removing the delimiter?

I know this question has already been answered in the past. But I am however still encountering difficulties, although I have tried applying multiple suggestions I found online. So, what I want is quite easy. Split this string

"__label__2:somedata" 

or

"__label__43:somedata" 

and get

['__label__2:', 'somedata'] 

or

['__label__43:', 'somedata'].

Here is the code I have:

import re
line = "__label__2:somedata"
p = re.split("(__label__{1,2}:)", line)
print (p)

But this unfortunately prints

['__label__2:somedata']

What am I doing wrong here?

You need to add \\d+ inside your regular expression, like so:

(__label__\d+:)

This also allows you to capture all numericals rather than having to list all possible values...

"(__label__{1,2}:)" is not doing what you think. {1,2} is requesting 1 or 2 repeats of the __label__ string, not the characters 1 or 2 .

The correct syntax is using [12] :

import re

re.split('(__label__[12]:)', "__label__2:somedata")

output: ['', '__label__2:', 'somedata']

You seem to hesitate between spliting the string and matching a part of it. Both are possible but they are different and have different use cases.

  1. split:

    You have just to split on : and add the delimiter to all parts but the last:

     lst = line.split(':') mx = len(lst) - 1 result = [ s if i == mx else s + ':' for i, s in enumerate(lst)]
  2. match:

    You have to match the first part and, separately, the rest of the line:

     m = re.match('(__label__\\\\d{1,2}:)(.*)', line) resul = m.groups()

You will split if you can expect to have more than 2 fields, and match if you want to control the pattern of the first one.

You can use .partition :

>>> s="__label__2:somedata"
>>> t=s.partition(':')
>>> [t[0]+t[1], t[2]]
['__label__2:', 'somedata']

If you have a bunch, you can use a comprehension:

cases=("__label__2:somedata", "__label__43:somedata" )

>>> [[t[0]+t[1], t[2]] for t in map(lambda s: s.partition(':'), cases)]
[['__label__2:', 'somedata'], ['__label__43:', 'somedata']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM