简体   繁体   中英

How can I remove empty double quotes from a CSV file in Python?

How do I remove empty double quotes from my CSV file using Python?

Here is what the file currently looks like:

"text","more text","","other text","","text"

Here is what I want it to look like:

"text","more text",,"other text",,"text"

I think the best solution is to use the quotechar option from csv.reader , then filter empty fields:

import csv

with open('test.csv', newline='') as csvf:
    for row in csv.reader(csvf, delimiter=',', quotechar='"'):
        row = filter(lambda v: v, row)
        # Now row is just an iterator containing non-empty strings
        # You can use it as you please, for example: 
        print(', '.join(row))

If instead of removing empty fields you need to replace them by a given value (like None ):

import csv

def read(file, placeholder=None):
    with open(file, newline='') as csvf:
        for row in csv.reader(csvf, delimiter=',', quotechar='"'):
            yield [v if v else placeholder for v in row]

for row in read('test.csv'):
    pass # Do something with row

If for example you need to print it to stdout with surroundings double quotes (which is a silly example):

for row in read('test.csv'):
    print(', '.join(f'"{v}"' if v else '' for v in row))

you can try:

>>> s=""""text","more text","","other text","","text" """
>>> s
'"text","more text","","other text","","text" '
>>> s.replace('""','')
'"text","more text",,"other text",,"text" '

A combination of a lambda function and some pandas magic will increase speed greatly, once your DataFrame is loaded you will obtain something like

处理之前

Then you just need to write a lambda function

replacer = lambda x: x.replace('""','')
df = df.apply(replacer)

Which does the operation you are looking for and gives you 应用替代品后

Then just use df.to_csv(filepathAsStr) to save changes to disk or just continue with the operations you need, df.apply() parallelizes across the dataframe so this will improve performance significantly compared to simple str.replace or any method that uses serial computation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM