简体   繁体   中英

Split a string on a certain character only if it doesn't follow directly after another particular character

I have the following code line which is splitting the string data2 up into a list upon instances of a white space:

string_list = data2.split()

However in some of my data there are dates in the format "28, Dec" . Here the above code is splitting on the white space between the date and the month when I don't want it to. Is there a way I can say "split on the white space, but not if it is after a comma"?

You need to use regular expressions .

>>> re.split('(?<!,) ', 'blah blah, blah')
['blah', 'blah, blah']

From the link:

(?<!...) Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

Use re.split with a negative lookbehind expression:

re.split(r'(?<!,)\s','I went on 28, Dec')
Out[53]: ['I', 'went', 'on', '28, Dec']

You can split using a regular expression and utilize look-behind expressions to make sure that you don't split on a whitespace character that is preceded by a comma:

>>> import re
>>> s = 'foo bar 28, Dec bar baz'
>>> re.split('(?<!,)\s', s)
['foo', 'bar', '28, Dec', 'bar', 'baz']

Sorry to refloat this thread, but I was trying to decode sqlite cells, and something seems odd to me. I´ll explain. I´m trying to code two different numbers into one cell by creating a string with a 0 in between and then numerizing it, so for example: a=4 b=7 c=str(4)+'0'+str(7)

The problem is when the first number is 10, so I´m using this re.split('0([1-9])','1003') ['10','3','']

Why I´m getting a trhee lenght list when it should be just 2?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM