简体   繁体   English

内联 CSV 文件编辑 Python

[英]Inline CSV File Editing with Python

Can I modify a CSV file inline using Python's CSV library, or similar technique?我可以使用 Python 的 CSV 库或类似技术内联修改 CSV 文件吗?

Current I am processing a file and updating the first column (a name field) to change the formatting.目前我正在处理一个文件并更新第一列(名称字段)以更改格式。 A simplified version of my code looks like this:我的代码的简化版本如下所示:

with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
    writer = csv.writer(csvOutput, delimiter=',', quotechar='"')

    with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
        reader = csv.reader(csvFile, delimiter=',', quotechar='"')

        for row in reader:
            row[0] = row[0].title()
            writer.writerow(row)

The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.这个理念行得通,但我很好奇我是否可以进行内联编辑,这样我就不会复制文件。

I've tried the follow, but this appends the new records to the end of the file instead of replacing them.我尝试了以下操作,但这会将新记录附加到文件末尾而不是替换它们。

with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(csvFile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

No, you should not attempt to write to the file you are currently reading from.不,您不应该尝试写入您当前正在读取的文件。 You can do it if you keep seek ing back after reading a row but it is not advisable, especially if you are writing back more data than you read.如果您在读取一行后继续seek ,则可以执行此操作,但不建议这样做,尤其是当您写回的数据多于读取的数据时。

The canonical method is to write to a new, temporary file and move that into place over the old file you read from.规范的方法是写入一个新的临时文件并将其移到您读取的旧文件上。

from tempfile import NamedTemporaryFile
import shutil
import csv

filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)

with open(filename, 'r', newline='') as csvFile, tempfile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(tempfile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

shutil.move(tempfile.name, filename)

I've made use of the tempfile and shutil libraries here to make the task easier.我在这里使用了tempfileshutil库来shutil任务。

There is no underlying system call for inserting data into a file.没有数据插入文件的底层系统调用。 You can overwrite, you can append, and you can replace.您可以覆盖,可以追加,也可以替换。 But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.但是将数据插入中间意味着从您进行编辑的那一点到最后读取和重写整个文件。

As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end.因此,执行此操作的两种方法是 (a) 将整个文件放入内存中,在那里进行编辑,然后将结果转储回磁盘,或者 (b) 打开一个临时输出文件,在其中写入您的结果当您读取输入文件时,然后在读到最后时用新文件替换旧文件。 One method uses more ram, the other uses more disk space.一种方法使用更多内存,另一种方法使用更多磁盘空间。

If you just want to modify a csv file inline by using Python, you may just employ pandas:如果您只想使用 Python 内联修改 csv 文件,您可以使用 pandas:

import pandas as pd
df = pd.read_csv('yourfilename.csv')

# modify the "name" in row 1 as "Lebron James"   
df.loc[1, 'name'] = "Lebron James"

# save the file using the same name
df.to_csv("yourfilename.csv")  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM