How to import a non-structured flat file into Python using pandas?

Question

I'm trying to import a .db file into pandas. The file is ordered as:

Person 1

Characteristic 1: Value 

Characteristic 2: Value


Person 2

Characteristic 1: Value

Etc

I want to import the data into pandas and have the persons as rows with their different characteristics in the columns like this:

Person Characteristic 1 Characteristic 2

Person 1 Value Value

Person 2 Value Value

Etc

I've tried to look around but only found advice for importing normal flat files where the columns are already specified in the file before import.

Any help would be greatly appreciated.

Answer 1

Assumptions:

input is a line oriented text file
empty lines are to be ignored
lines containing no colon( ':' ) declare a new record
lines containing colons declare attributes for the current record

That is not a file format that pandas can directly process, but Python can easily build a list of records, that will later feed a dataframe:

records = []
current = None
fieldnames = ['Person']

with open('inputfile') as file:
    for line in file:
        line = line.strip()
        if len(line) != 0:            # ignore empty lines
            if ':' in line:           # a characteristic line
                attr, value = line.split(':', 1)
                attr = attr.strip()
                current[attr] = value.strip()
                if not attr in fieldnames:
                    fieldnames.append(attr)
            else:                                      # a person line
                current = {'Person': line}
                records.append(current)

df = pd.DataFrame(columns = fieldnames, data = records)

With your sample data, it gives as expected:

     Person Characteristic 1 Characteristic 2
0  Person 1            Value            Value
1  Person 2            Value              NaN

How to import a non-structured flat file into Python using pandas?

Question

1 answers

solution1
0 2020-01-13 16:15:57

How to import a non-structured flat file into Python using pandas?

Question

1 answers

solution1 0 2020-01-13 16:15:57

solution1
0 2020-01-13 16:15:57