简体   繁体   English

带有自定义分隔符的Python readline

[英]Python readline with custom delimiter

novice here. 新手在这里。 I am trying to read lines from a file, however a single line in a .txt file has a \\n in the middle somewhere and while trying to read that line with .readline python cuts it in the middle and outputs as two lines. 我正在尝试从文件中读取行,但是.txt文件中的单行在某处有一个\\n ,并且在尝试使用.readline读取该行时,python将其切换为中间并输出为两行。

  • when I copy and past the line to this window, it shows up as two lines. 当我复制并通过该行到这个窗口时,它显示为两行。 So i uploaded the file here: https://ufile.io/npt3n 所以我在这里上传了这个文件: https//ufile.io/npt3n

  • also added screenshot of the file as it shows in txt file. 还添加了文件的截图,如txt文件中所示。

  • this is group chat history exported from Whatsup..if you are wondering. 这是从Whatsup出口的群聊历史。如果你想知道的话。
  • Please help me to read one line completely as shown in txt file. 请帮我完整阅读一行,如txt文件所示。

.

f= open("f.txt",mode='r',encoding='utf8')

for i in range(4):
    lineText=f.readline()
    print(lineText)

f.close()

在此输入图像描述

Python 3 allows you to define what is the newline for a particular file. Python 3允许您定义特定文件的换行符。 It is seldom used, because the default universal newlines mode is very tolerant: 它很少使用,因为默认的通用换行模式非常宽容:

When reading input from the stream, if newline is None, universal newlines mode is enabled. 从流中读取输入时,如果换行为“无”,则启用通用换行模式。 Lines in the input can end in '\\n', '\\r', or '\\r\\n', and these are translated into '\\n' before being returned to the caller. 输入中的行可以以'\\ n','\\ r'或'\\ r \\ n'结尾,并且在返回给调用者之前将这些行转换为'\\ n'。

So here you should made explicit that only '\\r\\n' is an end of line: 所以在这里你应该明确指出只有'\\ r \\ n'是行尾:

f= open("f.txt",mode='r',encoding='utf8', newline='\r\n')

# use enumerate to show that second line is read as a whole
for i, line in enumerate(fd):   
    print(i, line)

Instead of using readline function, you can read whole content and split lines by regex: 您可以通过正则表达式读取整个内容和拆分行,而不是使用readline函数:

import re

with open("txt", "r") as f:
    content = f.read()
    # remove end line characters
    content = content.replace("\n", "")
    # split by lines
    lines = re.compile("(\[[0-9//, :\]]+)").split(content)
    # clean "" elements
    lines = [x for x in lines if x != ""]
# join by pairs
lines = [i + j for i, j in zip(lines[::2], lines[1::2])]

If all content has the same beginning [...] you can split by this, then clean all parts omitting the "" elements. 如果所有内容都有相同的开头[...],您可以按此分割,然后清除省略“”元素的所有部分。 Then you can join each part with zip function ( https://stackoverflow.com/a/5851033/1038301 ) 然后你可以用zip功能加入每个部分( https://stackoverflow.com/a/5851033/1038301

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM