简体   繁体   中英

How can I split strings while ignoring the portion in a parentheses in python

I have a string as follows: 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)' 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'

I want to split it into ['Shri' 'Ram' 'Janki' 'Impex' 'Pvt.' 'Ltd.' '(Lucknow, UP)'] ['Shri' 'Ram' 'Janki' 'Impex' 'Pvt.' 'Ltd.' '(Lucknow, UP)']

I have used .split() and it returns me the following: ['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', '(Lucknow,', 'UP)']

Using regex:

import re
re.split(r'\s+(?=")', c.strip())

['Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)']

I want to do it in Python3.

You can add a negative lookahead to your regex:

>>> s = 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
>>> re.split(r'\s+(?!\w+\))', s)
['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', '(Lucknow, UP)']

This splits on spaces only if they aren't followed by a word ending in ) .

If you want to apply this to a dataframe column, I'd compile the regex first and then apply it with a map :

splitter = re.compile(r'\s+(?!\w+\))')
df['my_column'] = df['my_column'].map(splitter.split)

May be you can do it in two parts:

s ='Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
s.split('(')[0].split() + s.split('.')[2:]

Output :

['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', ' (Lucknow, UP)']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM