简体   繁体   English

如何使用 Pandas 将非结构化平面文件导入 Python?

[英]How to import a non-structured flat file into Python using pandas?

I'm trying to import a .db file into pandas.我正在尝试将 .db 文件导入熊猫。 The file is ordered as:该文件的顺序如下:


Person 1

Characteristic 1: Value 

Characteristic 2: Value


Person 2

Characteristic 1: Value

Etc等等


I want to import the data into pandas and have the persons as rows with their different characteristics in the columns like this:我想将数据导入到 Pandas 中,并将这些人作为具有不同特征的行,如下所示:


Person Characteristic 1 Characteristic 2

Person 1 Value Value

Person 2 Value Value

Etc等等


I've tried to look around but only found advice for importing normal flat files where the columns are already specified in the file before import.我试图环顾四周,但只找到了导入普通平面文件的建议,其中在导入之前已在文件中指定了列。

Any help would be greatly appreciated.任何帮助将不胜感激。

Assumptions:假设:

  • input is a line oriented text file输入是面向行的文本文件
  • empty lines are to be ignored空行将被忽略
  • lines containing no colon( ':' ) declare a new record不包含冒号( ':' )的行声明一个新记录
  • lines containing colons declare attributes for the current record包含冒号的行声明当前记录的属性

That is not a file format that pandas can directly process, but Python can easily build a list of records, that will later feed a dataframe:这不是 Pandas 可以直接处理的文件格式,但 Python 可以轻松构建记录列表,稍后将提供数据帧:

records = []
current = None
fieldnames = ['Person']

with open('inputfile') as file:
    for line in file:
        line = line.strip()
        if len(line) != 0:            # ignore empty lines
            if ':' in line:           # a characteristic line
                attr, value = line.split(':', 1)
                attr = attr.strip()
                current[attr] = value.strip()
                if not attr in fieldnames:
                    fieldnames.append(attr)
            else:                                      # a person line
                current = {'Person': line}
                records.append(current)

df = pd.DataFrame(columns = fieldnames, data = records)

With your sample data, it gives as expected:使用您的示例数据,它按预期提供:

     Person Characteristic 1 Characteristic 2
0  Person 1            Value            Value
1  Person 2            Value              NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM