繁体   English   中英

将表读入熊猫数据框

[英]Read table into dataframe in pandas

我有一个包含表的文件(tbl扩展名)。 其内容如下所示:

Gibberish Gibberish Gibberish 
{Group}
Name = 'Messi'
Height = 170 cm
Weight = 72 kg
{End Group}
{Group}
Name = 'Ronaldo'
Height = 187 cm
Weight = 84 kg
{End Group}

如何将其读取到熊猫数据框? 我想将其与另一个文件合并。 我希望输出与此类似:

      height   weight
messi   170      72
ronaldo 187      84

我看着熊猫read_table但无济于事。

任何帮助表示赞赏。

我写了一个函数来概括

import pandas as pd
import re


def read_custom_table(filename,
                      rec_st_lim='{',
                      rec_end_lim='}',
                      k_v_sep=':',
                      item_sep=',',
                      index_col=None):
    """
    This function takes a text file name as input,
    read the text and extracts records
    and returns a pandas dataframe
    Inputs
    ---------------
    filename:  string containing system file name

    rec_st_lim: string of varied length(1+) marking the start of
    a single record

    rec_end_lim: string of varied length(1+) marking the end of
    a single record

    k_v_sep: key-value seperator within a an arbitray record.

    item_sep: item seperator, seperates key/value pairs

    index_col: the name of the column to use as index, default =None
    i.e. index is a numerical range
    ----------------
    Output: df, a dataframe with columns = the keys in an arbitrary
    record and index = index_col when index_col is not None

   """

    pattern = r"{}(.*?){}".format(rec_st_lim, rec_end_lim)

    with open(filename) as f:
        df = pd.DataFrame(
            list
            (map
             (lambda rec:
              dict([(el.strip() for el in r.split(k_v_sep))
                    for r in rec.split(item_sep) if len(r) > 1]),
              re.findall(pattern, f.read(), re.DOTALL)
              )
             )
        )
        f.close()
    if index_col:
        df.set_index(index_col, inplace=True)
    return df

该功能可以在OP示例中的数据上使用,如下所示

df = read_custom_table('debug.txt',
                                                 rec_st_lim='\{Group\}',
                                                 rec_end_lim='\{End Group\}',
                                                 k_v_sep='=',
                                                 item_sep='\n',
                                                 index_col='Name')
print(df)

输出将

           Height Weight
Name                    
'Messi'    170 cm  72 kg
'Ronaldo'  187 cm  84 kg

完成操作的一种方法是执行字符串处理,然后将数据转换为字典列表,然后将其转换为数据框。

例:

import pandas as pd

stringVal = ''
with open("Path to inputfile", "r") as infile:   #I have the data that you have posted in you question as content in input file
    for i in infile.readlines():
        if i.startswith("Name"):
            stringVal += (i+"|").replace("\n", "").replace("'", "")
        if i.startswith("Height"):
            stringVal += (i+"|").replace("\n", "")
        if i.startswith("Weight"):
            stringVal += i+"\n" 

res = []    
for i in stringVal.strip().split("\n"):
    if i:
        d = {}
        for j in i.split("|"):
            val = j.split("=")
            d[val[0].strip()] = val[1].strip()
        res.append(d)

df = pd.DataFrame(res)
df = df.set_index('Name') 
print df

输出:

         Height Weight
Name                  
Messi    170 cm  72 kg
Ronaldo  187 cm  84 kg

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM