I have a string as follows: 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
I want to split it into ['Shri' 'Ram' 'Janki' 'Impex' 'Pvt.' 'Ltd.' '(Lucknow, UP)']
['Shri' 'Ram' 'Janki' 'Impex' 'Pvt.' 'Ltd.' '(Lucknow, UP)']
I have used .split()
and it returns me the following: ['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', '(Lucknow,', 'UP)']
Using regex:
import re
re.split(r'\s+(?=")', c.strip())
['Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)']
I want to do it in Python3.
You can add a negative lookahead to your regex:
>>> s = 'Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
>>> re.split(r'\s+(?!\w+\))', s)
['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', '(Lucknow, UP)']
This splits on spaces only if they aren't followed by a word ending in )
.
If you want to apply this to a dataframe column, I'd compile the regex first and then apply it with a map
:
splitter = re.compile(r'\s+(?!\w+\))')
df['my_column'] = df['my_column'].map(splitter.split)
May be you can do it in two parts:
s ='Shri Ram Janki Impex Pvt. Ltd. (Lucknow, UP)'
s.split('(')[0].split() + s.split('.')[2:]
Output :
['Shri', 'Ram', 'Janki', 'Impex', 'Pvt.', 'Ltd.', ' (Lucknow, UP)']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.