如何改进此Python脚本以替换dbf文件中的记录？

Question

我有一个大约有900万条记录和2.5 GB大小的dbf文件。 80个大小的字符字段占用了很多空间，这些字符字段用于存储大约10个不同的字符串中的1个。 为了节省文件大小，我想用整数字段替换字符字段，并在以后使用关系数据库以在需要时获取完整的字符字段。

当前，我有以下使用dbf库的Python脚本（ http://pythonhosted.org/dbf/ ）。 该脚本似乎正在运行（已在较小的dbf文件上进行了测试），但是当我尝试使用完整的dbf文件运行该脚本时，它将运行几个小时。

import dbf

tabel = dbf.Db3Table('dataset.dbf')
tabel.open()

with tabel:
 tabel.add_fields('newfield N(2, 0)')
 for record in tabel:
     if record.oldfield == 'string_a                                                                        ':
         dbf.write(record, newfield=1)
     elif record.oldfield == 'string_b                                                                        ':
         dbf.write(record, newfield=2)
     elif record.oldfield == 'string_c                                                                        ':
         dbf.write(record, newfield=3)
     elif record.oldfield == 'string_d                                                                        ':
         dbf.write(record, newfield=4)
     elif record.oldfield == 'string_e                                                                        ':
         dbf.write(record, newfield=5)
     elif record.oldfield == 'string_f                                                                        ':
         dbf.write(record, newfield=6)
     elif record.oldfield == 'string_g                                                                        ':
         dbf.write(record, newfield=7)
     elif record.oldfield == 'string_h                                                                        ':
         dbf.write(record, newfield=8)
     elif record.oldfield == 'string_i                                                                        ':
         dbf.write(record, newfield=9)
     elif record.oldfield == 'string_j                                                                        ':
         dbf.write(record, newfield=10)
     else:
         dbf.write(record, newfield=0)

dbf.delete_fields('dataset.dbf', 'oldfield')

从代码中可以看到，Python和dbf库都是我的新手。 可以使此脚本更有效地运行吗？

Answer 1

添加和删除字段都将首先创建2.5GB文件的备份副本。

最好的选择是创建一个与原始dbf具有相同结构的新dbf，这两个字段除外。 然后在复制每条记录时进行更改。 就像是：

# lightly untested

old_table = dbf.Table('old_table.dbf')
structure = old_table.structure()
old_field_index = structure.index('oldfield')
structure = structure[:old_field_index] + structure[old_field_index+1:]
structure.append('newfield N(2,0)')
new_table = dbf.Table('new_name_here.dbf', structure)

with dbf.Tables(old_table, new_table):
    for rec in old_table:
        rec = list(rec)
        old_value = rec.pop(old_field_index)
        rec.append(<transform old_value into new_value>)
        new_table.append(tuple(rec))

如何改进此Python脚本以替换dbf文件中的记录？

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-05-02 16:39:11

如何改进此Python脚本以替换dbf文件中的记录？

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-05-02 16:39:11

解决方案1
3 已采纳 2018-05-02 16:39:11