努力将 DBF 文件转换为 Pandas DataFrame

Question

我正在尝试使用在此处公开的加拿大广播电台 DBF 文件： https://sms-sgs.ic.gc.ca/eic/site/sms-sgs-prod.nsf/eng/h_00015.html

我想专门将 fmstatio.dbf 文件读入 Pandas DataFrame。 我试过 Python 中两个常用的 DBF 包。

使用 simpledbf ( https://pypi.org/project/simpledbf/ ) 时，我只在使用 dbf.to_dataframe() function 时获得列名。

我还在 pypi ( https://pypi.org/project/dbf/ ) 上尝试了 dbf。 我能够将 DBF 文件读入表中：

table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf')
table.open(dbf.READ_ONLY)
print(table)
table.close()

并在表中获得以下信息：

   Table:         /datadrive/canada/fmstatio.dbf
    Type:          dBase III Plus
    Codepage:      ascii (plain ol' ascii)
    Status:        DbfStatus.READ_ONLY
    Last updated:  1921-12-07
    Record count:  8428
    Field count:   37
    Record length: 221

但是在尝试转换成DataFrame时，我不成功：

oh_canada = pd.DataFrame(table)
table.close()

我收到的错误：

    data = fielddef[CLASS](decoder(data)[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ordinal not in range(128)

可能有人对 go 关于在 Pandas 中使用这种 DBF 文件的最佳方法有见解吗？ 提前谢谢了。

Answer 1

该表说它是“普通的旧 ascii”，但它是谎言。 它包含“带有尖锐口音的 e”，考虑到加拿大数据库中的法语内容，这并不奇怪。 要解决此问题，您需要覆盖代码页：

table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf',codepage=3)

“3”表示默认的 Windows 代码页 CP1252。 这样，我就可以读取文件了。

我仍然不确定pandas可以导入它作为迭代器提供的格式。 您可能需要使用export将其转换为列表。

努力将 DBF 文件转换为 Pandas DataFrame

问题描述

1 个解决方案

解决方案1
0 2021-12-13 02:48:25

努力将 DBF 文件转换为 Pandas DataFrame

问题描述

1 个解决方案

解决方案1 0 2021-12-13 02:48:25

解决方案1
0 2021-12-13 02:48:25