简体   繁体   中英

Is there a way to replace and trim values in individual fields using the dbf to csv python library dbf by ethanfurman?

I am using the python library, dbf , by Ethan Furman to convert a number of dbf files to csv. It works extremely well for that. I would like to further edit some of the fields during the conversion process but am unsure how to do it. Specifically, I would like to replace string fields that contain only 1 or more blanks with empty strings, (eg. " " replace with "") and date fields that contain "00000000" with empty strings "". I would very much appreciate it if someone could describe how to edit the fields and write out the updated records during the conversion process. Obviously, I could write a simple secondary script to edit the csv files output during conversion but I would like to do it all in one step if possible. Here is the code I am using to convert the files:

import csv
import dbf
import os
import sys

folder=sys.argv[1]

for dirpath, dirnames, filenames in os.walk(folder):
    for filename in filenames:
        if filename.endswith('.DBF'):
            db=dbf.Table(filename, ignore_memos=True)
            db.open()
            csv_fn = filename[:-4]+ ".csv"
            dbf.export(db, filename=csv_fn, format='csv', header=True)

By default, when using a dbf table the data types returned are simple -- ie int , str , bool , datetime.datetime , etc. But you can make your own data types and have those used instead by specifying them in the default_data_types parameter:

db = dbf.Table(
        filename,
        ignore_memos=True,
        default_data_types={
            'C': my_white_space_stripping_data_type,
            'D': my_empty_date_str_data_type,
            },
        )

Fortunately, dbf comes with four enhanced data types already:

  • Char -- automatically strips trailing whitespace, and ignores trailing whitespace for comparisons

  • Logical -- supports True , False , and None ( None is returned when the field value is not true or false -- I've seen ? , ' ' , and other weird garbage)

  • Date -- supports an empty date, such as 00000000 , and displays them as ''

  • DateTime -- supports an empty date/time, and displays them as ''

Typically, if you're using one of the enhanced data types you probably want them all, so instead of the dictionary you can just pass a string:

db = dbf.Table(
        filename,
        ignore_memos=True,
        default_data_types='enhanced_data_types',
        )

Now, when a csv file is exported, trailing white-space is dropped, and empty date fields become '' .

Keep in mind that empty logical fields will become '?' instead of '' , so you may want the longer form of specifying a dict to default_data_types and only overriding C and D .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM