简体   繁体   English

使用唯一的分隔符和行尾将txt文件读取到pandas数据帧

[英]Read txt file to pandas dataframe with unique delimiter and end of line

I have a text file of a table, with a unique delimiter and a unique set of characters to mark the end of each line / row.我有一个表格的文本文件,有一个唯一的分隔符和一组唯一的字符来标记每一行/行的结尾。

eg new column marked by #%# new row marked by ##@##例如由#%#标记的新列由##@##标记的新行

So the text file might read...所以文本文件可能会读...

cat#%#dog#%#rat#%#cow##@##red#%#blue#%#green#%#yellow##@##north#%#south#%#east#%#west

Which should be read as a table with 3 rows and 4 columns, where I can add column names during loading.应该将其视为具有 3 行 4 列的表格,我可以在加载过程中添加列名。

cat dog rat cow奶牛
red红色的 blue蓝色的 green绿色 yellow黄色
north south east东方 west西

I've tried pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##') with engine as both python and c, but c can't accept more than one character for the delimiter and python can't accept values for delimiter and lineterminator.我试过pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##')引擎作为 python 和 c,但 c 不能接受多个字符delimiter 和 python 不能接受 delimiter 和 lineterminator 的值。

Is my only option to read the text file, change the delimiter and end of line value to a single character, save and read again using read_csv?我唯一的选择是读取文本文件、将分隔符和行尾值更改为单个字符、使用 read_csv 保存并再次读取吗?

According to the official documentation根据官方文档

lineterminator : str (length 1), optional Character to break file into lines. lineterminator : str (length 1), 可选字符将文件分成几行。 Only valid with C parser.仅对 C 解析器有效。

Therefore I think your best option would be to open the text file and replace the line terminator before using read_csv.因此,我认为您最好的选择是在使用 read_csv 之前打开文本文件并替换行终止符。

I guess as pointed out by matheubv there is no option to solve this with pd.read_csv .我想正如 matheubv 所指出的那样,没有办法用pd.read_csv解决这个pd.read_csv However this can be easily fixed a few lines of codes.然而,这可以很容易地修复几行代码。 Just open the file (in the example sample.csv ) and parse it (use the string method .replace() ).只需打开文件(在示例sample.csv )并解析它(使用字符串方法.replace() )。 Afterwards you can read in the data currently saved as string in data_string with a very basic list comprehension.之后,您可以使用非常基本的列表理解读取当前保存为data_string字符串的数据。

Hope this work-around helps you希望这个解决方法可以帮助你

import pandas as pd
from pathlib import Path

p = Path("Data/sample.csv")

with p.open() as f:
    string_data = f.readline().replace('#%#',';').replace('##@##','\n')
    df = pd.DataFrame([x.split(';') for x in string_data.split('\n')])
    print(df)

Output:输出:

       0      1      2       3
0    cat    dog    rat     cow
1    red   blue  green  yellow
2  north  south   east    west

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM