如何从转置格式将 a.txt 文件读入 pandas DataFrame

Question

我正在尝试将数据集读入 pandas dataframe。 数据集当前位于 a.txt 文件中，它看起来像这样：

name: hello_world
rating: 5
description: basic program

name: python
rating: 10
description: programming language

如您所见，列名开始于每一行，然后是数据。 dataframe 的不同行由额外的行分隔。 有没有一种简单的方法可以将这种类型的文件读入 pandas，还是我只需要手动完成？

谢谢！

编辑：感谢大家的帮助。 答案似乎是，是的，您必须手动完成。 我已经在下面发布了我手动执行的方式，但我确信还有其他更有效的方法。

Answer 1

数据.txt：

name: hello_world
rating: 5
description: basic program

name: python
rating: 10
description: programming language

代码：

import pandas as pd
with open('data.txt', 'rt') as fin:
    lst = [line[:-1] for line in fin if line[:-1]]
print(lst)

# Soln 1
d = dict()
d['name'] = [ele.split(':')[1] for ele in lst if ele.startswith('name:')]
d['rating'] = [ele.split(':')[1] for ele in lst if ele.startswith('rating:')]
d['description'] = [ele.split(':')[1] for ele in lst if ele.startswith('description:')]
df = pd.DataFrame(data=d)
print(df)

＃或者

data_tuples_lst = [(lst[i].split(':')[1], lst[i+1].split(':')[1], lst[i+2].split(':')[1]) for  i in range(0, len(lst), 3) ]
df1 = pd.DataFrame(data=data_tuples_lst, columns = ['name', 'rating', 'description'])
print(df1)

Output：

['name: hello_world', 'rating: 5', 'description: basic program', 'name: python', 'rating: 10', 'description: programming language']
           name rating            description
0   hello_world      5          basic program
1        python     10   programming language
           name rating            description
0   hello_world      5          basic program
1        python     10   programming language

Answer 2

我的看法。 再次作为我学习 pandas 的一部分。

import pandas as pd
from io import StringIO

data = '''\
name: hello_world
rating: 5
description: basic program

name: python
rating: 10
description: programming language

name: foo
rating: 20
description: bar
'''
buffer = StringIO()
buffer.write('field: value\n')  # add column headers
buffer.write(data)
buffer.seek(0)

df = pd.read_csv(buffer, delimiter=':')

transposed = df.T

_, col_count = transposed.shape

x = []
for i in range(0, col_count, 3):
    tmp = transposed[[i, i + 1, i + 2]]
    columns = tmp.iloc[0]
    tmp = tmp[1:]
    tmp.columns = columns
    x.append(tmp)

out = pd.concat(x)
print(out.to_string(index=False))

我真的很感谢有人使用 pandas 来展示更好的方法。

Answer 3

这是处理“横向”数据集的一种方法。 与先前的答案相比，此代码已被编辑以提高效率。

示例代码：

import pandas as pd
from collections import defaultdict

# Read the text file into a list.
with open('prog.txt') as f:
    text = [i.strip() for i in f]

# Split the list into lists of key, value pairs.
d = [i.split(':') for i in text if i]
# Create a data container.
data = defaultdict(list)
# Store the data in a DataFrame-ready dict.
for k, v in d:
    data[k].append(v.strip())

# Load the DataFrame.
df = pd.DataFrame(data)

Output：

          name rating           description
0  hello_world      5         basic program
1       python     10  programming language

Answer 4

我认为你必须手动完成。 如果您检查来自 Pandas 的 I/O API（https://pandas.pydata.org/pandas-docs/stable/user_guide/io.ZFC35FDC70D5FC69D269883A822C7A5没有定义自定义读取程序的方法。）

Answer 5

万一以后有人来这里，这就是我所做的。 我只是将输入文件转换为 csv（除了我使用“|”作为分隔符，因为数据集包含字符串）。 感谢大家的意见，但我忘了提到它是一个 2GB 的数据文件，所以我不想为我那可怜的过度劳累的笔记本电脑做任何密集的事情。

import pandas as pd


ofile = open("out_file.csv", 'w')
ifile = open("in_file.txt", 'r', encoding='cp1252')

for l in ifile:
  if l == '\n':
    ofile.write('\n')
  else:
    ofile.write(l.split(':')[1][:-1] + '|')

ofile.close()
ifile.close()

然后我打开 dataframe 使用：

import pandas as pd
df =pd.read_csv('out_file.csv', sep="|", skipinitialspace=True, index_col=False)

Answer 6

在使用此代码获得@aaj-kaal 提出的列表后：

import pandas as pd
with open('data.txt', 'rt') as fin:
    lst = [line[:-1] for line in fin if line[:-1]]

您可以通过以下方式直接获取 dataframe：

dict_df=pd.DataFrame()
dict_df['name'] = [ele.split(':')[1] for ele in lst if ele.startswith('name:')]
dict_df['rating'] = [ele.split(':')[1] for ele in lst if \
                    ele.startswith('rating:')]
dict_df['description'] = [ele.split(':')[1] for ele in lst\
                         if ele.startswith('description:')]
dict_df

output

name    rating          description
0       hello_world 5   basic program
1       python  10      programming language

Answer 7

一个通用的提议：

import pandas as pd
def from_txt_transposed_to_pandas(file):
    """
    take a txt file like this:

    "
    name: hello_world
    rating: 5
    description: basic program

    name: python
    rating: 10
    description: programming language 
    "

    -of any length- and returns a dataframe.
    """
    tabla = pd.read_table(file)
    cols = list(set([x.split(":")[0] for x in tabla.iloc[::,0]]))
    tabla_df= pd.DataFrame(columns = cols)
    elem = list(tabla[tabla.columns[0]])+[tabla.columns[0]]
    for n in range(len(cols)):  
        tabla_df[cols[n]]= [x.split(":")[1] for x in elem if\ 
        x.startswith(cols[n])]
    return tabla_df

如何从转置格式将 a.txt 文件读入 pandas DataFrame

问题描述

7 个解决方案

解决方案1
1 2020-11-27 17:09:05

解决方案2
1 2020-11-27 17:25:04

解决方案3
1 2020-11-27 19:46:27

解决方案4
0 已采纳 2020-11-27 16:33:49

解决方案5
0 2020-11-27 20:21:52

解决方案6
0 2022-07-25 13:02:52

解决方案7
0 2022-07-25 14:12:30

如何从转置格式将 a.txt 文件读入 pandas DataFrame

问题描述

7 个解决方案

解决方案1 1 2020-11-27 17:09:05

解决方案2 1 2020-11-27 17:25:04

解决方案3 1 2020-11-27 19:46:27

解决方案4 0 已采纳 2020-11-27 16:33:49

解决方案5 0 2020-11-27 20:21:52

解决方案6 0 2022-07-25 13:02:52

解决方案7 0 2022-07-25 14:12:30

解决方案1
1 2020-11-27 17:09:05

解决方案2
1 2020-11-27 17:25:04

解决方案3
1 2020-11-27 19:46:27

解决方案4
0 已采纳 2020-11-27 16:33:49

解决方案5
0 2020-11-27 20:21:52

解决方案6
0 2022-07-25 13:02:52

解决方案7
0 2022-07-25 14:12:30