简体   繁体   English

如何使用自定义(多行)行终止符读取文本文件?

[英]how to read a text file with a custom (multiline) line terminator?

I have a huge text file like this我有一个像这样的巨大文本文件

1,2,3,4,5$*$*$2,5,1,3,2$*$*$

where $*$*$ is the line terminator (in reality this is needed because all kind of text may be included in the regular columns: newline characters, etc).其中$*$*$是行终止符(实际上这是必需的,因为所有类型的文本都可能包含在常规列中:换行符等)。

How can I parse the txt file efficiently and put it into a Pandas dataframe?如何有效地解析txt文件并将其放入 Pandas 数据框中? pd.read_csv() only accept length-1 line terminators... so it fails here. pd.read_csv()只接受长度为 1 的行终止符......所以它在这里失败。

Here I am looking for我在这里寻找

1,2,3,4
2,5,1,3

Thanks!谢谢!

Maybe you can parse it before.也许你可以先解析它。 Though I don't know any Pandas, I managed to make it work (I think):虽然我不知道任何 Pandas,但我设法让它工作(我认为):

with open("your_text_file.txt") as f:
    s = f.read()
parts = s.split("$*$*$")

df = pd.DataFrame(columns=['ONE', 'TWO', 'THREE', 'FOUR', 'FIVE'])
for i, line in enumerate(parts):
    if line:
        df.loc[i] = line.split(",")

print(df)

How about replacing your line terminator with one that pandas can understand?用熊猫可以理解的终结符替换你的行终结符怎么样?

from io import StringIO

s = '1,2,3,4,5$*$*$2,5,1,3,2$*$*$'
pd.read_csv(StringIO(s.replace('$*$*$', '\n')), header=None)

will return将返回

   0  1  2  3  4
0  1  2  3  4  5
1  2  5  1  3  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM