简体   繁体   English

使用Python的内置.csv模块编写

[英]Writing with Python's built-in .csv module

[Please note that this is a different question from the already answered How to replace a column using Python's built-in .csv writer module? [请注意,这是一个与已经回答的问题不同的问题如何使用Python的内置.csv编写器模块替换列? ] ]

I need to do a find and replace (specific to one column of URLs) in a huge Excel .csv file. 我需要在一个巨大的Excel .csv文件中进行查找和替换(特定于一列URL)。 Since I'm in the beginning stages of trying to teach myself a scripting language, I figured I'd try to implement the solution in python. 由于我正处于尝试自学脚本语言的初级阶段,我想我会尝试在python中实现该解决方案。

I'm having trouble when I try to write back to a .csv file after making a change to the contents of an entry. 当我在更改条目内容后尝试写回.csv文件时,我遇到了麻烦。 I've read the official csv module documentation about how to use the writer, but there isn't an example that covers this case. 我已经阅读了有关如何使用编写器官方csv模块文档 ,但是没有一个示例涵盖了这种情况。 Specifically, I am trying to get the read, replace, and write operations accomplished in one loop. 具体来说,我试图在一个循环中完成读取,替换和写入操作。 However, one cannot use the same 'row' reference in both the for loop's argument and as the parameter for writer.writerow(). 但是,在for循环的参数和writer.writerow()的参数中都不能使用相同的'row'引用。 So, once I've made the change in the for loop, how should I write back to the file? 所以,一旦我在for循环中进行了更改,我应该如何写回文件?

edit: I implemented the suggestions from S. Lott and Jimmy, still the same result 编辑:我实施了S. Lott和Jimmy的建议,结果仍然相同

edit #2: I added the "rb" and "wb" to the open() functions, per S. Lott's suggestion 编辑#2:根据S. Lott的建议,我将“rb”和“wb”添加到open()函数中

import csv

#filename = 'C:/Documents and Settings/username/My Documents/PALTemplateData.xls'

csvfile = open("PALTemplateData.csv","rb")
csvout = open("PALTemplateDataOUT.csv","wb")
reader = csv.reader(csvfile)
writer = csv.writer(csvout)

changed = 0;

for row in reader:
    row[-1] = row[-1].replace('/?', '?')
    writer.writerow(row)                  #this is the line that's causing issues
    changed=changed+1

print('Total URLs changed:', changed)

edit: For your reference, this is the new full traceback from the interpreter: 编辑:供您参考,这是解释器的完整回溯:

Traceback (most recent call last):
  File "C:\Documents and Settings\g41092\My Documents\palScript.py", line 13, in <module>
    for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

You cannot read and write the same file. 您无法读取和写入同一文件。

source = open("PALTemplateData.csv","rb")
reader = csv.reader(source , dialect)

target = open("AnotherFile.csv","wb")
writer = csv.writer(target , dialect)

The normal approach to ALL file manipulation is to create a modified COPY of the original file. ALL文件操作的常规方法是创建原始文件的修改后的COPY。 Don't try to update files in place. 不要尝试更新文件。 It's just a bad plan. 这只是一个糟糕的计划。


Edit 编辑

In the lines 在线

source = open("PALTemplateData.csv","rb")

target = open("AnotherFile.csv","wb")

The "rb" and "wb" are absolutely required. 绝对需要“rb”和“wb”。 Every time you ignore those, you open the file for reading in the wrong format. 每次忽略这些时,都会打开文件以便以错误的格式读取。

You must use "rb" to read a .CSV file. 您必须使用“rb”来读取.CSV文件。 There is no choice with Python 2.x. Python 2.x别无选择。 With Python 3.x, you can omit this, but use "r" explicitly to make it clear. 使用Python 3.x,您可以省略它,但明确使用“r”来表明它。

You must use "wb" to write a .CSV file. 您必须使用“wb”来编写.CSV文件。 There is no choice with Python 2.x. Python 2.x别无选择。 With Python 3.x, you must use "w". 使用Python 3.x,您必须使用“w”。


Edit 编辑

It appears you are using Python3. 看来你正在使用Python3。 You'll need to drop the "b" from "rb" and "wb". 你需要从“rb”和“wb”中删除“b”。

Read this: http://docs.python.org/3.0/library/functions.html#open 阅读本文: http//docs.python.org/3.0/library/functions.html#open

Opening csv files as binary is just wrong. 将csv文件打开为二进制文件是错误的。 CSV are normal text files so You need to open them with CSV是普通文本文件,因此您需要打开它们

source = open("PALTemplateData.csv","r")
target = open("AnotherFile.csv","w")

The error 错误

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

comes because You are opening them in binary mode. 因为你是以二进制模式打开它们。

When I was opening excel csv's with python, I used something like: 当我用python打开excel csv时,我使用了类似的东西:

try:    # checking if file exists
    f = csv.reader(open(filepath, "r", encoding="cp1250"), delimiter=";", quotechar='"')
except IOError:
    f = []

for record in f:
    # do something with record

and it worked rather fast (I was opening two about 10MB each csv files, though I did this with python 2.6, not the 3.0 version). 它工作得相当快(我开了两个大约10MB的每个csv文件,虽然我用python 2.6,而不是3.0版本)。

There are few working modules for working with excel csv files from within python - pyExcelerator is one of them. 在python中使用excel csv文件的工作模块很少 - pyExcelerator就是其中之一。

the problem is you're trying to write to the same file you're reading from. 问题是你正在尝试写入你正在阅读的同一个文件。 write to a different file and then rename it after deleting the original. 写入其他文件,然后在删除原始文件后重命名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM