使用唯一的分隔符和行尾将txt文件读取到pandas数据帧

Question

I have a text file of a table, with a unique delimiter and a unique set of characters to mark the end of each line / row.我有一个表格的文本文件，有一个唯一的分隔符和一组唯一的字符来标记每一行/行的结尾。

eg new column marked by #%# new row marked by ##@##例如由#%#标记的新列由##@##标记的新行

So the text file might read...所以文本文件可能会读...

cat#%#dog#%#rat#%#cow##@##red#%#blue#%#green#%#yellow##@##north#%#south#%#east#%#west

Which should be read as a table with 3 rows and 4 columns, where I can add column names during loading.应该将其视为具有 3 行 4 列的表格，我可以在加载过程中添加列名。


cat猫	dog狗	rat鼠	cow奶牛
red红色的	blue蓝色的	green绿色	yellow黄色
north北	south南	east东方	west西

I've tried pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##') with engine as both python and c, but c can't accept more than one character for the delimiter and python can't accept values for delimiter and lineterminator.我试过pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##')引擎作为 python 和 c，但 c 不能接受多个字符delimiter 和 python 不能接受 delimiter 和 lineterminator 的值。

Is my only option to read the text file, change the delimiter and end of line value to a single character, save and read again using read_csv?我唯一的选择是读取文本文件、将分隔符和行尾值更改为单个字符、使用 read_csv 保存并再次读取吗？

Answer 1

According to the official documentation根据官方文档

lineterminator : str (length 1), optional Character to break file into lines. lineterminator : str (length 1), 可选字符将文件分成几行。 Only valid with C parser.仅对 C 解析器有效。

Therefore I think your best option would be to open the text file and replace the line terminator before using read_csv.因此，我认为您最好的选择是在使用 read_csv 之前打开文本文件并替换行终止符。

Answer 2

I guess as pointed out by matheubv there is no option to solve this with pd.read_csv .我想正如 matheubv 所指出的那样，没有办法用pd.read_csv解决这个pd.read_csv 。 However this can be easily fixed a few lines of codes.然而，这可以很容易地修复几行代码。 Just open the file (in the example sample.csv ) and parse it (use the string method .replace() ).只需打开文件（在示例sample.csv ）并解析它（使用字符串方法.replace() ）。 Afterwards you can read in the data currently saved as string in data_string with a very basic list comprehension.之后，您可以使用非常基本的列表理解读取当前保存为data_string字符串的数据。

Hope this work-around helps you希望这个解决方法可以帮助你

import pandas as pd
from pathlib import Path

p = Path("Data/sample.csv")

with p.open() as f:
    string_data = f.readline().replace('#%#',';').replace('##@##','\n')
    df = pd.DataFrame([x.split(';') for x in string_data.split('\n')])
    print(df)

Output:输出：

       0      1      2       3
0    cat    dog    rat     cow
1    red   blue  green  yellow
2  north  south   east    west

使用唯一的分隔符和行尾将txt文件读取到pandas数据帧

问题描述

2 个解决方案

解决方案1
0 2021-07-22 15:29:34

解决方案2
0 已采纳 2021-07-23 08:58:24

Output:输出：

使用唯一的分隔符和行尾将txt文件读取到pandas数据帧

问题描述

2 个解决方案

解决方案1 0 2021-07-22 15:29:34

解决方案2 0 已采纳 2021-07-23 08:58:24

Output:输出：

解决方案1
0 2021-07-22 15:29:34

解决方案2
0 已采纳 2021-07-23 08:58:24