简体   繁体   中英

Replace Comma Outside Double Quote - Python - Regex

I want to open a CSV file, using open() . I read it per line. For some reason, I'm not using Pandas.

I want to replace comma , with _XXX_ , but I want to avoid replacing commas inside double quotes " because that comma is not a separation tag, so I can't use:

string_ = string_.replace(',', '_XXX_')

How to do this? User regex maybe?

I've found replace comma inside quotation, Python regex: find and replace commas between quotation marks , but i need replace comma OUTSIDE quotation.

You may use a re.sub with a simple "[^"]*" regex (or (?s)"[^"\\]*(?:\\.[^"\\]*)*" if you need to handle escaped sequences in between double quotes, too) to match strings between double quotes, capture this pattern into Group 1, and then match a comma in all other contexts. Then, pass the match object to a callable used as the replacement argument where you may further manipulate the match.

import re
print( re.sub(r'("[^"]*")|,', 
    lambda x: x.group(1) if x.group(1) else x.group().replace(",", ""),
    '1,2,"test,3,7","4, 5,6, ... "') )
    # => 12"test,3,7""4, 5,6, ... "

print( re.sub(r'(?s)("[^"\\]*(?:\\.[^"\\]*)*")|,', 
    lambda x: x.group(1) if x.group(1) else x.group().replace(",", ""),
    r'1,2,"test, \"a,b,c\" ,03","4, 5,6, ... "') )
    # => 12"test, \"a,b,c\" ,03""4, 5,6, ... "

See the Python demo .

Regex details

  • ("[^"]*")|, :
    • ("[^"]*") - Capturing group 1: a " , then any 0 or more chars other than " and then a "
    • | - or
    • , - a comma

The other one is

  • (?s) - the inline version of a re.S / re.DOTALL flag
  • ("[^"\\]*(?:\\.[^"\\]*)*") - Group 1: a " , then any 0 or more chars other than " and \ then 0 or more sequences of a \ and any one char followed with 0 or more chars other than " and \ and then a "
  • | - or
  • , - comma.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM