简体   繁体   中英

Regex to match any number AND any characters between quotes

I'm confronted to this weird csv formatting, containing non escaped , character:

   641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"  

I need to return a list like this

[641,'Harstad/Narvik Airport Evenes', 'Harstad/Narvik', 'Norway', 'EVE', 'ENEV', 68.491302490234,16.678100585938,84,1, 'E', 'Europe/Oslo', 'airport', 'OurAirports']

I have two regex to match part of the string:

  • (\d+\.?\d*) match numbers
  • (["'])(?:(?=(\\?))\2.)*?\1 match any characters between two single or double quote

Is there a way to merge the matching into one result?

You may use this regex:

>>> s = '641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"'

>>> csvData = re.findall(r'"[^"\\]*(?:\\.[^"\\]*)*"|\d+(?:\.\d+)?', s)
>>> print csvData

['641', '"Harstad/Narvik Airport, Evenes"', '"Harstad/Narvik"', '"Norway"', '"EVE"', '"ENEV"', '68.491302490234', '16.678100585938', '84', '1', '"E"', '"Europe/Oslo"', '"airport"', '"OurAirports"']

RegEx Details:

  • "[^"\\]*(?:\\.[^"\\]*)*" : Match a quoted string that allows escaped quotes or any other escaped character inside eg "foo\"bar" into a single element
  • | : OR
  • \d+(?:\.\d+)? : Match an integer or a decimal number

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM