I've got a problem with the following python script which extracts some options from text in an internal company web app text area.
import re
text = 'option one\noption two, option three, option four'
correct = 'option one, option two, option three, option four'
pattern = re.compile('(\s*[,]\s*)')
fixed = pattern.sub(', ', text)
print fixed
option one
option two, option three, option four
print fixed.split(', ')
['option one\noption two', 'option three', 'option four']
This obviously fails to split up 'option one\\noption two' into 'option one', 'option two'
So the input could end up as
option one
option two, option three, option four
which would need to be converted to
option one, option two, option three, option four
it works fine if its a comma
or
a comma followed by a newline
but not if its just a newline by itself.
Extend your character class from [,]
to [,\\n]
, maybe? Also, why don't you split on the regex directly, rather than search-and-replacing first and then splitting? This function: http://docs.python.org/library/re.html?highlight=re.split#re.split could come handy for this.
Can you just try
(\s*(,|\n)\s*)
?
Or probably even better
(\s*[,\n]\s*)
...I always forget you can put \\n
in a character class...
I got there without a regex:
print [x.strip() for x in text.replace('\n', ', ').split(', ')]
Result:
['option one', 'option two', 'option three', 'option four']
I'm not claiming this to be a good answer for your usage case. If you need to add extra delimiters it means adding an extra .replace()
for each.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.