Split based on commas but ignore commas within double-quotes

Question

I try to split strings based on commas with avoiding the ones within the double quotes.Then I need to add those split strings to the list.

line = "DATA", "LT", "0.40", "1.25", "Sentence, which contain, 
commas", "401", "", "MN", "", "", "", "", ""

when I try to do it with

lineItems = line.split(",")

It splits based on all commas.

Conversely, when I use regex to split, I get all elements as one element on the list. (can not split them).

Is there any chance to get:

newlist  = ['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, 
    commas', '401', '', 'MN', '', '', '', '', '']

Thanks!

PS I will have many similar rows so I want to get a similar result from all via iteration.

Answer 1

You could use the shlex in-built module, like so

import shlex
line = '"DATA", "LT", "0.40", "1.25", "Sentence, which contain, commas", "401", "", "MN", "", "", "", "", ""'

newlist = [x[:-1] for x in shlex.split(line)]

Answer 2

You mentioned you tried to split a 'string' variable. Therefor I assume you forgot to add the appropriate quotes. Is the following helpfull, assuming balanced double quotes?

import regex as re

line = """ "DATA", "LT", "0.40", "1.25", "Sentence, which contain, 
commas", "401", "", "MN", "", "", "", "", "" """

l = re.findall(r'"([^"]*)"', line)

print(l)

Prints:

['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, \ncommas', '401', '', 'MN', '', '', '', '', '']

Split based on commas but ignore commas within double-quotes

Question

2 answers

solution1
4 2022-08-02 09:50:23

solution2
2 ACCPTED 2022-08-02 09:50:11

Split based on commas but ignore commas within double-quotes

Question

2 answers

solution1 4 2022-08-02 09:50:23

solution2 2 ACCPTED 2022-08-02 09:50:11

solution1
4 2022-08-02 09:50:23

solution2
2 ACCPTED 2022-08-02 09:50:11