简体   繁体   中英

Split based on commas but ignore commas within double-quotes

I try to split strings based on commas with avoiding the ones within the double quotes.Then I need to add those split strings to the list.

line = "DATA", "LT", "0.40", "1.25", "Sentence, which contain, 
commas", "401", "", "MN", "", "", "", "", ""

when I try to do it with

lineItems = line.split(",")

It splits based on all commas.

Conversely, when I use regex to split, I get all elements as one element on the list. (can not split them).

Is there any chance to get:

newlist  = ['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, 
    commas', '401', '', 'MN', '', '', '', '', '']

Thanks!

PS I will have many similar rows so I want to get a similar result from all via iteration.

You could use the shlex in-built module, like so

import shlex
line = '"DATA", "LT", "0.40", "1.25", "Sentence, which contain, commas", "401", "", "MN", "", "", "", "", ""'

newlist = [x[:-1] for x in shlex.split(line)]

You mentioned you tried to split a 'string' variable. Therefor I assume you forgot to add the appropriate quotes. Is the following helpfull, assuming balanced double quotes?

import regex as re

line = """ "DATA", "LT", "0.40", "1.25", "Sentence, which contain, 
commas", "401", "", "MN", "", "", "", "", "" """

l = re.findall(r'"([^"]*)"', line)

print(l)

Prints:

['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, \ncommas', '401', '', 'MN', '', '', '', '', '']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM