How can I split strings while ignoring the portion in a parentheses in python

Question

I have a string as follows: 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)' 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'

I want to split it into ['Shri' 'Ram' 'Janki' 'Impex' 'Pvt.' 'Ltd.' '(Lucknow, UP)'] ['Shri' 'Ram' 'Janki' 'Impex' 'Pvt.' 'Ltd.' '(Lucknow, UP)']

I have used .split() and it returns me the following: ['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', '(Lucknow,', 'UP)']

Using regex:

import re
re.split(r'\s+(?=")', c.strip())

['Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)']

I want to do it in Python3.

Answer 1

You can add a negative lookahead to your regex:

>>> s = 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
>>> re.split(r'\s+(?!\w+\))', s)
['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', '(Lucknow, UP)']

This splits on spaces only if they aren't followed by a word ending in ) .

If you want to apply this to a dataframe column, I'd compile the regex first and then apply it with a map :

splitter = re.compile(r'\s+(?!\w+\))')
df['my_column'] = df['my_column'].map(splitter.split)

Answer 2

May be you can do it in two parts:

s ='Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
s.split('(')[0].split() + s.split('.')[2:]

Output :

['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', ' (Lucknow, UP)']

How can I split strings while ignoring the portion in a parentheses in python

Question

2 answers

solution1
2 2020-09-17 19:52:15

solution2
1 2020-09-17 19:49:54

How can I split strings while ignoring the portion in a parentheses in python

Question

2 answers

solution1 2 2020-09-17 19:52:15

solution2 1 2020-09-17 19:49:54

solution1
2 2020-09-17 19:52:15

solution2
1 2020-09-17 19:49:54