简体   繁体   中英

How to import a non-structured flat file into Python using pandas?

I'm trying to import a .db file into pandas. The file is ordered as:


Person 1

Characteristic 1: Value 

Characteristic 2: Value


Person 2

Characteristic 1: Value

Etc


I want to import the data into pandas and have the persons as rows with their different characteristics in the columns like this:


Person Characteristic 1 Characteristic 2

Person 1 Value Value

Person 2 Value Value

Etc


I've tried to look around but only found advice for importing normal flat files where the columns are already specified in the file before import.

Any help would be greatly appreciated.

Assumptions:

  • input is a line oriented text file
  • empty lines are to be ignored
  • lines containing no colon( ':' ) declare a new record
  • lines containing colons declare attributes for the current record

That is not a file format that pandas can directly process, but Python can easily build a list of records, that will later feed a dataframe:

records = []
current = None
fieldnames = ['Person']

with open('inputfile') as file:
    for line in file:
        line = line.strip()
        if len(line) != 0:            # ignore empty lines
            if ':' in line:           # a characteristic line
                attr, value = line.split(':', 1)
                attr = attr.strip()
                current[attr] = value.strip()
                if not attr in fieldnames:
                    fieldnames.append(attr)
            else:                                      # a person line
                current = {'Person': line}
                records.append(current)

df = pd.DataFrame(columns = fieldnames, data = records)

With your sample data, it gives as expected:

     Person Characteristic 1 Characteristic 2
0  Person 1            Value            Value
1  Person 2            Value              NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM