简体   繁体   English

如何使用python更改csv文件的定界符,同时还剥离新定界符的字段?

[英]How can I use python to change the delimiter of a csv file while also stripping the fields of the new delimiter?

I receive a well formated csv file, with double-quotes around text fields that contain commas. 我收到了格式良好的csv文件,在包含逗号的文本字段周围用双引号引起来。

Alas, I need to load it into SQL Server, which, as far as I have learned (please tell me how I am wrong here) cannot handle quote-enclosed fields that contain the delimiter. las,我需要将其加载到SQL Server中,据我所知(请告诉我这里的问题),SQL Server无法处理包含定界符的用引号引起来的字段。

So, I would like to write a python script which will a) convert the file to pipe-delimited, and b) strip whatever pipes exist in the fields (my sense is that commas are more common, so I'd like to save them, plus I also have some numeric fields that might, at least in the future, contain commas). 因此,我想编写一个python脚本,它将a)将文件转换为以竖线分隔的字符串,并b)删除字段中存在的所有管道(我的意思是逗号更常见,因此我想保存它们,此外,我还有一些数字字段,至少在将来可能包含逗号)。

Here is the code that I have to do a: 这是我要做的代码:

import csv
import sys

source_file=sys.argv[1]
good_file=sys.argv[2]
bad_file=sys.argv[3]

with open(source_file, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    with open(good_file, 'w') as new_file:
            csv_writer = csv.DictWriter(new_file, csv_reader.fieldnames, delimiter='|')
            headers = dict( (n,n) for n in csv_reader.fieldnames)
            csv_writer.writerow(headers)
            for line in csv_reader:
                    csv_writer.writerow(str.replace(line, '|', ' '))

How can I augment it to do b? 我如何扩充它来做b?

ps--I am using python 2.6, IIRC. ps-我正在使用python 2.6,IIRC。

SQL Server can load the type of file you describe. SQL Server可以加载您描述的文件类型。 The file can most certainly be loaded with an SSIS package and can also be loaded with the SQL Server bcp utility. 该文件肯定可以用SSIS包加载,也可以用SQL Server bcp实用程序加载。 Writing the python script would not be the way to go (to introduce another technology into the mix when not needed... just imho). 编写python脚本不是要走的路(在不需要时将另一种技术引入混合中……只是恕我直言)。 SQL Server is equipped to handle exactly what you are wanting to do. SQL Server能够准确地处理您要执行的操作。

ssis is pretty straightforward. sis很简单。 For BCP, you'll need to not use the -t option (to specify a field terminator for the entire file) and instead use a format file. 对于BCP,您无需使用-t选项(为整个文件指定字段终止符),而应使用格式文件。 Using a format file, you can customize each fields terminator. 使用格式文件,您可以自定义每个字段终止符。 For the fields that are quoted you'll want to use a custom delimiter. 对于引用的字段,您将要使用自定义定界符。 See this post or many others like it that detail how to use BCP and files with delimiters and quoted fields to hide delimiters that might appear in the data. 请参阅本文或其他类似文章,其中详细介绍了如何使用带有分隔符和带引号的字段的BCP和文件来隐藏可能出现在数据中的分隔符。

SQL Server BCP Export where comma in SQL field SQL Server BCP导出SQL字段中的逗号

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM