I'm trying to import a .db file into pandas. The file is ordered as:
Person 1
Characteristic 1: Value
Characteristic 2: Value
Person 2
Characteristic 1: Value
Etc
I want to import the data into pandas and have the persons as rows with their different characteristics in the columns like this:
Person Characteristic 1 Characteristic 2
Person 1 Value Value
Person 2 Value Value
Etc
I've tried to look around but only found advice for importing normal flat files where the columns are already specified in the file before import.
Any help would be greatly appreciated.
Assumptions:
':'
) declare a new recordThat is not a file format that pandas can directly process, but Python can easily build a list of records, that will later feed a dataframe:
records = []
current = None
fieldnames = ['Person']
with open('inputfile') as file:
for line in file:
line = line.strip()
if len(line) != 0: # ignore empty lines
if ':' in line: # a characteristic line
attr, value = line.split(':', 1)
attr = attr.strip()
current[attr] = value.strip()
if not attr in fieldnames:
fieldnames.append(attr)
else: # a person line
current = {'Person': line}
records.append(current)
df = pd.DataFrame(columns = fieldnames, data = records)
With your sample data, it gives as expected:
Person Characteristic 1 Characteristic 2
0 Person 1 Value Value
1 Person 2 Value NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.