简体   繁体   中英

Reading and splitting a .csv file, which contains strings with commas in

I have a.csv file which looks something like this:

1,2,"a,b",3
4,"c,d",5,6

Which I am reading and storing in an array like this:

with open(filename, 'r') as f:
    data = f.readlines()
data = [line.split(',') for line in data]

Which results in an array like this:

[['1','2','"a','b"','3']['4','"c','d"','5','6']]

HOWEVER, I would like to keep the items within double quotes such as "a,b" in one element of the data array (which is how they are opened in Excel), like this:

[[1,2,'a,b',3][4,'c,d',5,6]]

Is there an easy way to achieve this in Python?

Edit: preferably without using the csv module if possible?

You should use the csv module:

import csv

with open('test.csv') as f:
    reader = csv.reader(f)
    
    for row in reader:
        print(row)

Output:

['1', '2', 'a,b', '3']
['4', 'c,d', '5', '6']

Or, if you don't want to read lines lazily and want all in a single list, as in your question, you can simply do:

with open('test.csv') as f:
    reader = csv.reader(f)
    data = list(reader)

print(data)        
# [['1', '2', 'a,b', '3'], ['4', 'c,d', '5', '6']]   

Using csv module:

import csv

with open('test.csv') as file:
    reader = csv.reader(file)
    
data = [row for row in reader]

if you don't want to use csv module, this function will return your desired output

def function(file_name):
    with open(file_name, 'r') as file:
        file_read = file.readlines()
        raw_data = [line.split(',') for line in file_read]

        file_data = list()
        place_0 = 0
        place_1 = 0
        ext_item = str()
        added = list()
        pre_final_list = list()
        pre_pure_list = list()
        pure_data = str()
        final_list = list()

        for List in raw_data:
            for k, v in enumerate(List):
                List[k] = v.rstrip()
        
        for line in raw_data:
            if line == ['']:
                continue
            file_data.append(line)

        for line in file_data:
            for key, value in enumerate(line):
                if '"' in value[0] and '"' in value[-1]:
                    continue
                if '"' in value[0]:
                    place_0 = key
                if '"' in value[-1]:
                    place_1 = key
                if place_1 != 0:
                    for ind in range(place_0, place_1+1):
                        added.append(line[ind])
                    for e_item in added:
                        if e_item == added[-1]:
                            ext_item += e_item
                        else:
                            ext_item += e_item + ','
                    line[place_0] = ext_item
                    for r_item_index in range(place_0+1, place_1+1):
                        line[r_item_index] = None
                    place_0 = 0
                    place_1 = 0
                    ext_item = str()
                    added = list()

        for line in file_data:
            for value in line:
                try:
                    value = int(value)
                except: 
                    pass
                if value == '\n':
                    continue
                if not value is None:
                    pre_pure_list.append(value)
            pre_final_list.append(pre_pure_list)
            pre_pure_list = list()
        

        for List in pre_final_list:
            for key, item in enumerate(List):
                if type(item) is int or '"' not in item:
                    continue
                for string in item:
                    if string == '"':
                        continue
                    pure_data += string
                List[key] = pure_data
                pure_data = str()
            final_list.append(List)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM