简体   繁体   中英

How to read wrongly formatted string in csv properly?

In csv, for a column there is ambiguity in string. Because of that, I'm getting 6 values in list instead of 5 values as output.

Code:

import csv
csv_data = csv.reader(file('test.csv'))
for row in csv_data:
    print row

I tried to replace " with space to get atleast as normal string with out any quotes, as shown below,

for row in csv_data:
    print [r.replace('"',' ') for r in row] # This did't worked as expected.

Input:

row in csv file looks like this,

1,2,"text1", "Sample text ""present" in csv, as this",5

"Sample text "present" in csv, as this" # Error due to this value.

Output:

['1', '2', 'text1', 'Sample text present" in csv', 'as this', 5]

Expected output:

['1', '2', 'text1', 'Sample text "present" in csv, as this', 5]

This is almost embarrassingly hacky, but seems to work at least on the sample input shown in your question. It works by post-processing each row read by the csvreader and tries to detect when they have been read incorrectly due to the bad formatting — and then corrects it.

import csv

def read_csv(filename):
    with open(filename, 'rb') as file:
        for row in csv.reader(file, skipinitialspace=True, quotechar=None):
            newrow = []
            use_a = True
            for a, b in zip(row, row[1:]):
                # Detect bad formatting.
                if (a.startswith('"') and not a.endswith('"')
                        and not b.startswith('"') and b.endswith('"')):
                    # Join misread field backs together.
                    newrow.append(', '.join((a,b)))
                    use_a = False
                else:
                    if use_a:
                        newrow.append(a)
                    else:
                        newrow.append(b)
                        use_a = True
            yield [field.replace('""', '"').strip('"') for field in newrow]

for row in read_csv('fmt_test2.csv'):
    print(row)

Output:

['1', '2', 'text1', 'Sample text "present" in csv, as this', '5']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM