I want to open a CSV file, using open()
. I read it per line. For some reason, I'm not using Pandas.
I want to replace comma ,
with _XXX_
, but I want to avoid replacing commas inside double quotes "
because that comma is not a separation tag, so I can't use:
string_ = string_.replace(',', '_XXX_')
How to do this? User regex maybe?
I've found replace comma inside quotation, Python regex: find and replace commas between quotation marks , but i need replace comma OUTSIDE quotation.
You may use a re.sub
with a simple "[^"]*"
regex (or (?s)"[^"\\]*(?:\\.[^"\\]*)*"
if you need to handle escaped sequences in between double quotes, too) to match strings between double quotes, capture this pattern into Group 1, and then match a comma in all other contexts. Then, pass the match object to a callable used as the replacement argument where you may further manipulate the match.
import re
print( re.sub(r'("[^"]*")|,',
lambda x: x.group(1) if x.group(1) else x.group().replace(",", ""),
'1,2,"test,3,7","4, 5,6, ... "') )
# => 12"test,3,7""4, 5,6, ... "
print( re.sub(r'(?s)("[^"\\]*(?:\\.[^"\\]*)*")|,',
lambda x: x.group(1) if x.group(1) else x.group().replace(",", ""),
r'1,2,"test, \"a,b,c\" ,03","4, 5,6, ... "') )
# => 12"test, \"a,b,c\" ,03""4, 5,6, ... "
See the Python demo .
Regex details
("[^"]*")|,
:
("[^"]*")
- Capturing group 1: a "
, then any 0 or more chars other than "
and then a "
|
- or ,
- a comma The other one is
(?s)
- the inline version of a re.S
/ re.DOTALL
flag ("[^"\\]*(?:\\.[^"\\]*)*")
- Group 1: a "
, then any 0 or more chars other than "
and \
then 0 or more sequences of a \
and any one char followed with 0 or more chars other than "
and \
and then a "
|
- or ,
- comma.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.