简体   繁体   English

DBF - 编码cp1250

[英]DBF - encoding cp1250

I have dbf database encoded in cp1250 and I am reading this database using folowing code: 我在cp1250中编码了dbf数据库,我正在使用以下代码读取此数据库:

import csv
from dbfpy import dbf
import os
import sys

filename = sys.argv[1]
if filename.endswith('.dbf'):
    print "Converting %s to csv" % filename
    csv_fn = filename[:-4]+ ".csv"
    with open(csv_fn,'wb') as csvfile:
        in_db = dbf.Dbf(filename)
        out_csv = csv.writer(csvfile)
        names = []
        for field in in_db.header.fields:
            names.append(field.name)
        #out_csv.writerow(names)
        for rec in in_db:
            out_csv.writerow(rec.fieldData)
        in_db.close()
        print "Done..."
else:
  print "Filename does not end with .dbf"

Problem is, that final csv file is wrong. 问题是,最终的csv文件是错误的。 Encoding of the file is ANSI and some characters are corrupted. 该文件的编码是ANSI,并且某些字符已损坏。 I would like to ask you, if you can help me how to read dbf file correctly. 我想问你,如果你能帮我怎样正确读取dbf文件。

EDIT 1 编辑1

I tried different code from https://pypi.python.org/pypi/simpledbf/0.2.4 , there is some error. 我尝试了来自https://pypi.python.org/pypi/simpledbf/0.2.4的不同代码,有一些错误。

Source 2: 来源2:

from simpledbf import Dbf5
import os
import sys

dbf = Dbf5('test.dbf', codec='cp1250');
dbf.to_csv('junk.csv');

Output: 输出:

python program2.py
Traceback (most recent call last):
  File "program2.py", line 5, in <module>
    dbf = Dbf5('test.dbf', codec='cp1250');
  File "D:\ProgramFiles\Anaconda\lib\site-packages\simpledbf\simpledbf.py",      line 557, in __init__
    assert terminator == b'\r'

AssertionError Asse田

I really don't know how to solve this problem. 我真的不知道如何解决这个问题。

Try using my dbf library : 尝试使用我的dbf库

import dbf
with dbf.Table('test.dbf') as table:
    dbf.export(table, 'junk.csv')

I wrote simpledbf. 我写了simpledbf。 The line that is causing you problems was from some testing I was doing when developing the module. 导致问题的原因是我在开发模块时正在进行的一些测试。 First of all, you might want to update your installation, as 0.2.6 is the most recent. 首先,您可能希望更新安装,因为0.2.6是最新的。 Then you can try removing that particular line (#557) from the file "D:\\ProgramFiles\\Anaconda\\lib\\site-packages\\simpledbf\\simpledbf.py". 然后,您可以尝试从文件“D:\\ ProgramFiles \\ Anaconda \\ lib \\ site-packages \\ simpledbf \\ simpledbf.py”中删除该特定行(#557)。 If that doesn't work, you can ping me at the GitHub repo for simpledbf , or you could try Ethan's suggestion for the dbf module. 如果这不起作用,你可以在GitHub repo上 ping simpledbf ,或者你可以试试Ethan对dbf模块的建议。

You can decode and encode as necessary. 您可以根据需要进行解码和编码。 dbfpy assumes strings are utf8 encoded, so you can decode as it isn't that encoding and then encode again with the right encoding. dbfpy假设字符串是utf8编码的,因此您可以解码,因为它不是该编码,然后使用正确的编码再次编码。

import csv
from dbfpy import dbf
import os
import sys

filename = sys.argv[1]
if filename.endswith('.dbf'):
    print "Converting %s to csv" % filename
    csv_fn = filename[:-4]+ ".csv"
    with open(csv_fn,'wb') as csvfile:
        in_db = dbf.Dbf(filename)
        out_csv = csv.writer(csvfile)
        names = []
        for field in in_db.header.fields:
            names.append(field.name)
        #out_csv.writerow(names)
        for rec in in_db:
            row = [i.decode('utf8').encode('cp1250') if isinstance(i, str) else i for i in rec.fieldData]
            out_csv.writerow(rec.fieldData)
        in_db.close()
        print "Done..."
else:
  print "Filename does not end with .dbf"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM