简体   繁体   中英

Regex to split string on comma “,”, but only if comma is not in between digits

How could I split this given string into separate words -

Given string s = "Consumer notes, State Consumer Forum, Rs.50,000 penatly against ICICI,Andhra Pradesh"

I want the result to be = ["Consumer notes", "State Consumer Forum", "Rs.50,000 penatly against ICICI", "Andhra Pradesh"]

I am a newbie in regex and am not able to write regex for this.

Currently I am doing this

s = "Consumer notes, State Consumer Forum, Rs.50,000 penatly against ICICI,Andhra Pradesh"
result = set(w for w in s.split(r','))
print result

result:- 
set(['Andhra Pradesh', ' Rs.50', 'Consumer notes', '000 penatly against ICICI', ' State Consumer Forum'])

This gives me 5 words as it also splits the number Rs 50,000 into 2 parts. And I do not want this split. How can I solve it?

In [1]: s = "Consumer notes, State Consumer Forum, Rs.50,000 penatly against ICICI,Andhra Pradesh"

In [2]: import re

In [3]: re.split(r'(?<!\d),(?!\d)',s)
Out[3]: 
['Consumer notes',
 ' State Consumer Forum',
 ' Rs.50,000 penatly against ICICI',
 'Andhra Pradesh']

you can use re.split(r'(?<!\\d),\\s*(?!\\d)',s) to remove the spaces after , too.

You can use either

(?<!\d),|,(?!\d)

Or

,(?!(?<=\d.)\d)

See the regex #1 demo and regex #2 demo .

Details

  • (?<!\\d), - a comma not immediately preceded with a digit
  • | - or
  • ,(?!\\d) - a comma not immediately followed with a digit

This pattern is not that efficient because of 1) alternation and 2) lookbehind used at the start of the pattern making the regex engine check each position in the string.

  • , - a comma that is...
  • (?!(?<=\\d.)\\d) - not immediately followed with a digit (see (?!...\\d) ) that is immediately preceded with a digit and any one char (it is a comma in fact, so . and , here would work the same).

The second pattern is much more efficient as the regex engine only needs to test the commas in the text.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM