简体   繁体   English

Python中的readline()错误

[英]Wrong readline() in Python

I have a problem with th readline() method, it sometimes return 2 lines instead of one and I don't knwo why. 我有一个readline()方法的问题,它有时返回2行而不是一行,我不知道为什么。 Can someone help me ? 有人能帮我吗 ?

Here a part of the text file I read (With Notepad): 这里是我读过的文本文件的一部分(带记事本):

at+gpsinit=2
OK

+GPSEVINIT: 1
at+gpsnmea=3
OK
at+gpsstart=0
OK

And with Notepad ++ : 并使用Notepad ++:

at+gpsinit=2CR
CR LF
OKCR LF
CR LF
+GPSEVINIT: 1CR LF
at+gpsnmea=3CR
CR LF
OKCR LF
at+gpsstart=0CR
CR LF
OKCR LF

Here what I got in the Python shell : 这是我在Python shell中得到的:

16 : at+gpsinit=2

17 : 

18 : OK

19 : 

20 : +GPSEVINIT: 1

21 : at+gpsnmea=3

And here my code : 在这里我的代码:

# Open a file
file = open("testtxt.txt", 'r')
line = 0

for current_line in file:
    line += 1    
    print(str(line)+" : "+current_line)

# Close opend file
file.close()

The problem you are running into is most likely due to a problem with end-of-line markers. 您遇到的问题很可能是由于行尾标记出现问题。

  • Windows/Dos typically uses CRLF (or, \\r\\n , or 0d0a in bytes). Windows / Dos通常使用CRLF (或, \\r\\n ,或以字节为单位的0d0a )。
  • Unix typically uses LF (or \\n , or 0a in bytes) Unix通常使用LF (或\\n ,或字节为0a
  • MacOS typically uses CR (or \\r , or 0d in bytes) MacOS通常使用CR (或\\r或字节为0d

Here are some examples with an ASCII encoded file: 以下是ASCII编码文件的一些示例:

$ hexdump -C test_dos.txt
00000000  68 65 6c 6c 6f 0d 0a 77  6f 72 6c 64 0d 0a        |hello..world..|
0000000e

$ hexdump -C test_nix.txt
00000000  68 65 6c 6c 6f 0a 77 6f  72 6c 64 0a              |hello.world.|
0000000c

$ hexdump -C test_mac.txt
00000000  68 65 6c 6c 6f 0d 77 6f  72 6c 64 0d              |hello.world.|
0000000c

Ad you can see, the word hello ( 68 65 6c 6c 6f ) is followed with different bytes 0d 0a , 0a or 0d respectively. 广告你可以看到,单词hello68 65 6c 6c 6f )后面跟着不同的字节0d 0a0a0d When you edit a file in MS-Notepad, you will most likely insert CRLF . 在MS-Notepad中编辑文件时,很可能会插入CRLF As LF is most common in software development, Notepad++ is most likely adding those. 由于LF在软件开发中最常见,因此Notepad ++最有可能添加这些。

Now, to your code: Given the three files above, a similar code to yours yields the following result: 现在,对于您的代码:鉴于上面的三个文件,与您类似的代码产生以下结果:

Code: 码:

files = ('test_dos.txt', 'test_nix.txt', 'test_mac.txt')

for fname in files:
    print("Reading {}".format(fname))
    with open(fname) as fptr:
        for line in fptr:
            print("--> {!r}".format(line))
    print(80*"-")

Output: 输出:

Reading test_dos.txt
--> 'hello\r\n'
--> 'world\r\n'
--------------------------------------------------------------------------------
Reading test_nix.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------
Reading test_mac.txt
--> 'hello\rworld\r'
--------------------------------------------------------------------------------

As you can clearly see, Python splits on the \\n character, but does not remove it from the output. 正如您可以清楚地看到的,Python在\\n字符上分割,但不会从输出中删除它。 This is why the "mac" example only has one line. 这就是为什么“mac”示例只有一行。

If you have to deal with files coming from heterogenous sources, consider activating the "universal newlines" supprt with the U flag to open . 如果您必须处理来自异类源的文件,请考虑使用U标志激活“通用换行符”以打开

Here's an example. 这是一个例子。 Note that the only thing which changed is the U parameter to open : 请注意,唯一改变的是要openU参数:

files = ('test_dos.txt', 'test_nix.txt', 'test_mac.txt')

for fname in files:
    print("Reading {}".format(fname))
    with open(fname, 'U') as fptr:
        for line in fptr:
            print("--> {!r}".format(line))
    print(80*"-")

Output: 输出:

Reading test_dos.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------
Reading test_nix.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------
Reading test_mac.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------

As you can see, not all files behave identically. 如您所见,并非所有文件的行为都相同。 This might prompt you to pepper in U everywhere where you are reading text files. 这可能提示您辣椒U在那里你正在阅读的文本文件随处可见。 However, I am certain that there is a good reason why it's not the default! 但是,我确信有一个很好的理由说明它不是默认的! :) :)

There surely is no bug in the readline() routine; readline()例程肯定没有错误; too many people use it too regularly, and unless you have a very strange implementation which is not the standard Python, you will be using a decent version as well. 太多人经常使用它,除非你有一个非常奇怪的实现,而不是标准的Python,你也会使用一个不错的版本。

The information you provided yet is not enough to be sure what the reasons for your issue are but there are some analysis methods I would propose to find out what you are dealing with. 您提供的信息还不足以确定您的问题的原因是什么,但有一些分析方法我会建议您找出您正在处理的内容。

You should have a closer look at what there is in your lines, which bytes terminate your lines ( '\\n' or '\\r\\n' or whatever) and have an especially close look at the line at+gpsinit=2 and its end. 你应该仔细看看你的行中有什么,哪些字节终止你的行( '\\n''\\r\\n'或者其他什么)并且特别仔细看看at+gpsinit=2处的行和它的行结束。

On a Unix system you can use od (or xxd ) for this. 在Unix系统上,您可以使用od (或xxd )。 With option -c the characters are printed. 使用选项-c打印字符。 Use -t x1 -tc to also get hex output for each byte. 使用-t x1 -tc也可以获得每个字节的十六进制输出。

Ok so I solved my problem, and it seems that Np give me the wrong text file. 好的,所以我解决了我的问题,似乎Np给了我错误的文本文件。 Anyway I used this command : 无论如何我用这个命令:

file = open("testtxt.txt", 'r', newline="\r\n")

And it gave me the good lines. 它给了我好的台词。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM