解析一个巨大的 csv 文件，软件版本号有问题如何快速格式化 500 万行

Question

Here's a sample of my data:这是我的数据示例：

from io import StringIO

data = StringIO("""software,version
Visual C++ Minimum Runtime,11.0.61030
Visual C++ Minimum Runtime,11.0.61030
Visual C++ Minimum Runtime,11.0.61030.0.0.0.0""")

Notice that the last record the version number has 0.0.0.0 in it.请注意，版本号的最后一条记录中包含0.0.0.0 。

How can I get to xx.yy.zz first front 3 characters and clean up the remaining data?我怎样才能得到xx.yy.zz前 3 个字符并清理剩余的数据？

As an example: Visual C++ Minimum Runtime,11.0.61030.0.0.0.0 should be truncated to:例如： Visual C++ Minimum Runtime,11.0.61030.0.0.0.0应截断为：

"Visual C++ Minimum Runtime,11.0.61030"

Is there an efficient way to accomplish this?有没有一种有效的方法来完成这个？

Answer 1

You could use generators to load the file row by row and then write the truncated rows to a backup file.您可以使用生成器逐行加载文件，然后将截断的行写入备份文件。 eg.例如。

import csv

filename = "foo.csv"

def get_row(filename):
    with open(filename, "rb") as csvfile:
        data = csv.reader(csvfile)
        yield next(data)

with open('truncated.csv','wb') as truncatedcsv:
    writer = csv.writer(truncatedcsv, delimiter=',')
    for row in get_row(filename):
        truncated_row = # your truncation logic
        writer.writerow(truncated_row)

Don't forget to rename the new file and delete the old one.不要忘记重命名新文件并删除旧文件。

解析一个巨大的 csv 文件，软件版本号有问题如何快速格式化 500 万行

问题描述

1 个解决方案

解决方案1
0 2022-12-14 02:57:12

解析一个巨大的 csv 文件，软件版本号有问题如何快速格式化 500 万行

问题描述

1 个解决方案

解决方案1 0 2022-12-14 02:57:12

解决方案1
0 2022-12-14 02:57:12