简体   繁体   中英

How can I use python to change the delimiter of a csv file while also stripping the fields of the new delimiter?

I receive a well formated csv file, with double-quotes around text fields that contain commas.

Alas, I need to load it into SQL Server, which, as far as I have learned (please tell me how I am wrong here) cannot handle quote-enclosed fields that contain the delimiter.

So, I would like to write a python script which will a) convert the file to pipe-delimited, and b) strip whatever pipes exist in the fields (my sense is that commas are more common, so I'd like to save them, plus I also have some numeric fields that might, at least in the future, contain commas).

Here is the code that I have to do a:

import csv
import sys

source_file=sys.argv[1]
good_file=sys.argv[2]
bad_file=sys.argv[3]

with open(source_file, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    with open(good_file, 'w') as new_file:
            csv_writer = csv.DictWriter(new_file, csv_reader.fieldnames, delimiter='|')
            headers = dict( (n,n) for n in csv_reader.fieldnames)
            csv_writer.writerow(headers)
            for line in csv_reader:
                    csv_writer.writerow(str.replace(line, '|', ' '))

How can I augment it to do b?

ps--I am using python 2.6, IIRC.

SQL Server can load the type of file you describe. The file can most certainly be loaded with an SSIS package and can also be loaded with the SQL Server bcp utility. Writing the python script would not be the way to go (to introduce another technology into the mix when not needed... just imho). SQL Server is equipped to handle exactly what you are wanting to do.

ssis is pretty straightforward. For BCP, you'll need to not use the -t option (to specify a field terminator for the entire file) and instead use a format file. Using a format file, you can customize each fields terminator. For the fields that are quoted you'll want to use a custom delimiter. See this post or many others like it that detail how to use BCP and files with delimiters and quoted fields to hide delimiters that might appear in the data.

SQL Server BCP Export where comma in SQL field

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM