在 python 中打開 DBF 文件時出現問題

Question

我正在嘗試打開將幾個 DBF 文件轉換為 dataframe。 他們中的大多數工作正常，但對於其中一個文件我收到錯誤：“UnicodeDecodeError：'utf-8'編解碼器無法解碼 position 15 中的字節 0xf6：無效起始字節”

我已經在其他一些主題上閱讀過這個錯誤，例如打開 csv 和 xlsx 等文件。 建議的解決方案是在讀取文件部分中包含encoding = 'utf-8' 。 不幸的是，我還沒有找到 DBF 文件的解決方案，而且我對 DBF 文件的了解非常有限。

到目前為止我已經嘗試過：

1)

from dbfread import DBF
dbf = DBF('file.DBF')
dbf = pd.DataFrame(dbf)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>

2)

from simpledbf import Dbf5
dbf = Dbf5('file.DBF')
dbf = dbf.to_dataframe()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte

3)

# this block of code copied from https://gist.github.com/ryan-hill/f90b1c68f60d12baea81 
import pysal as ps

def dbf2DF(dbfile, upper=True): #Reads in DBF files and returns Pandas DF
    db = ps.table(dbfile) #Pysal to open DBF
    d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary
    #pandasDF = pd.DataFrame(db[:]) #Convert to Pandas DF
    pandasDF = pd.DataFrame(d) #Convert to Pandas DF
    if upper == True: #Make columns uppercase if wanted 
        pandasDF.columns = map(str.upper, db.header) 
    db.close() 
    return pandasDF

dfb = dbf2DF('file.DBF')

AttributeError: module 'pysal' has no attribute 'open'

最后，如果我嘗試安裝dbfpy模塊，我會收到： SyntaxError: invalid syntax

關於如何解決這個問題的任何建議？

Answer 1

嘗試使用我的dbf庫：

import dbf

table = dbf.Table('file.DBF')

打印它以查看文件中是否存在編碼：

print table    # print(table) in Python 3

我的一個測試表如下所示：

    Table:         tempy.dbf
    Type:          dBase III Plus
    Codepage:      ascii (plain ol ascii)
    Status:        DbfStatus.CLOSED
    Last updated:  2019-07-26
    Record count:  1
    Field count:   2
    Record length: 31 
    --Fields--
      0) name C(20)
      1) desc M

重要的一行是Codepage行——聽起來好像沒有為您的DBF文件正確設置。 如果您知道它應該是什么，您可以使用該代碼頁（臨時）打開它：

table = dbf.Table('file.DBF', codepage='...')

或者您可以使用以下命令永久更改它（更新DBF文件）：

table.open()
table.codepage = dbf.CodePage('cp1252') # for example
table.close()

Answer 2

對於所有幫助我解決這個問題的人，我必須修復損壞的.dbf 文件（因此來自 a.dbf 並且必須返回到 a.dbf）。 我的特殊問題是整個.dbf的日期......非常錯誤......並且通過許多方法嘗試並失敗了，有很多錯誤，以破解和重新組裝它......在成功完成以下操作之前：

#Modify dbase3 file to recast null date fields as a default date and 
#reimport back into dbase3 file

import collections
import datetime
from typing import OrderedDict
import dbf as dbf1
from simpledbf import Dbf5
from dbfread import DBF, FieldParser
import pandas as pd
import numpy as np

#Default date to overwrite NaN values
blank_date = datetime.date(1900, 1, 1)

#Read in dbase file from Old Path and point to new Path
old_path = r"C:\...\ex.dbf"
new_path = r"C:\...\newex.dbf"

#Establish 1st rule for resolving corrupted dates
class MyFieldParser(FieldParser):
    def parse(self, field, data):
        try:
            return FieldParser.parse(self, field, data)
        except ValueError:
            return blank_date

#Collect the original .DBF data while stepping over any errors
table = DBF(old_path, None, True, False, MyFieldParser, collections.OrderedDict, False, False, False,'ignore')

#Grab the Header Name, Old School Variable Format, and number of characters/length for each variable
dbfh = Dbf5(old_path, codec='utf-8')
headers = dbfh.fields
hdct = {x[0]: x[1:] for x in headers}
hdct.pop('DeletionFlag')
keys = hdct.keys()

#Position of Type and Length relative to field name
ftype = 0
characters = 1

# Reformat and join all old school DBF Header fields in required format
fields = list()

for key in keys:
    ftemp = hdct.get(key)
    k1 = str(key)
    res1 = ftemp[ftype]
    res2 = ftemp[characters]
    if k1 == "decimal_field_name":
        fields.append(k1 + " " + res1 + "(" + str(res2) + ",2)")
    elif res1 == 'N':
        fields.append(k1 + " " + res1 + "(" + str(res2) + ",0)")
    elif res1 == 'D':
        fields.append(k1 + " " + res1)
    elif res1 == 'L':
        fields.append(k1 + " " + res1)
    else: 
        fields.append(k1 + " " + res1 + "(" + str(res2) + ")")


addfields = '; '.join(str(f) for f in fields)

#load the records of the.dbf into a dataframe
df = pd.DataFrame(iter(table))

#go ham reformatting date fields to ensure they are in the correct format
df['DATE_FIELD1'] = df['DATE_FIELD1'].replace(np.nan, blank_date)

df['DATE_FIELD1'] = pd.to_datetime(df['DATE_FIELD1'])


# eliminate further errors in the dataframe
df = df.fillna('0')

#drop added "record index" field from dataframe
df.set_index('existing_primary_key', inplace=False)


#initialize defaulttdict and convert the dataframe into a .DBF appendable format
dd = collections.defaultdict(list)
records = df.to_dict('records',into=dd)

#create the new .DBF file
new_table = dbf1.Table(new_path, addfields)

#append the dataframe to the new .DBF file
new_table.open(mode=dbf1.READ_WRITE)

for record in records:
    new_table.append(record)

new_table.close()

Answer 3

 from simpledbf import Dbf5
 dbf2 = Dbf5('/Users/.../TCAT_MUNICIPIOS.dbf', codec='latin')
 df2 = dbf2.to_dataframe()
 df2.head(3)

Answer 4

安裝庫 DBF
conda install DBF
from dbfread import DBF
db_in_dbf = DBF('paht/database.dbf)這一行上傳數據庫
df = pd.DataFrame(db_in_dbf )這一行轉換了一個pandas的數據幀

在 python 中打開 DBF 文件時出現問題

問題描述

4 個解決方案

解決方案1
2 已采納 2019-07-26 16:31:30

解決方案2
0 2022-09-01 02:53:52

解決方案3
-1 2020-04-17 03:25:55

解決方案4
-1 2021-01-29 21:20:48

在 python 中打開 DBF 文件時出現問題

問題描述

4 個解決方案

解決方案1 2 已采納 2019-07-26 16:31:30

解決方案2 0 2022-09-01 02:53:52

解決方案3 -1 2020-04-17 03:25:55

解決方案4 -1 2021-01-29 21:20:48

解決方案1
2 已采納 2019-07-26 16:31:30

解決方案2
0 2022-09-01 02:53:52

解決方案3
-1 2020-04-17 03:25:55

解決方案4
-1 2021-01-29 21:20:48