简体   繁体   中英

Wrong readline() in Python

I have a problem with th readline() method, it sometimes return 2 lines instead of one and I don't knwo why. Can someone help me ?

Here a part of the text file I read (With Notepad):

at+gpsinit=2
OK

+GPSEVINIT: 1
at+gpsnmea=3
OK
at+gpsstart=0
OK

And with Notepad ++ :

at+gpsinit=2CR
CR LF
OKCR LF
CR LF
+GPSEVINIT: 1CR LF
at+gpsnmea=3CR
CR LF
OKCR LF
at+gpsstart=0CR
CR LF
OKCR LF

Here what I got in the Python shell :

16 : at+gpsinit=2

17 : 

18 : OK

19 : 

20 : +GPSEVINIT: 1

21 : at+gpsnmea=3

And here my code :

# Open a file
file = open("testtxt.txt", 'r')
line = 0

for current_line in file:
    line += 1    
    print(str(line)+" : "+current_line)

# Close opend file
file.close()

The problem you are running into is most likely due to a problem with end-of-line markers.

  • Windows/Dos typically uses CRLF (or, \\r\\n , or 0d0a in bytes).
  • Unix typically uses LF (or \\n , or 0a in bytes)
  • MacOS typically uses CR (or \\r , or 0d in bytes)

Here are some examples with an ASCII encoded file:

$ hexdump -C test_dos.txt
00000000  68 65 6c 6c 6f 0d 0a 77  6f 72 6c 64 0d 0a        |hello..world..|
0000000e

$ hexdump -C test_nix.txt
00000000  68 65 6c 6c 6f 0a 77 6f  72 6c 64 0a              |hello.world.|
0000000c

$ hexdump -C test_mac.txt
00000000  68 65 6c 6c 6f 0d 77 6f  72 6c 64 0d              |hello.world.|
0000000c

Ad you can see, the word hello ( 68 65 6c 6c 6f ) is followed with different bytes 0d 0a , 0a or 0d respectively. When you edit a file in MS-Notepad, you will most likely insert CRLF . As LF is most common in software development, Notepad++ is most likely adding those.

Now, to your code: Given the three files above, a similar code to yours yields the following result:

Code:

files = ('test_dos.txt', 'test_nix.txt', 'test_mac.txt')

for fname in files:
    print("Reading {}".format(fname))
    with open(fname) as fptr:
        for line in fptr:
            print("--> {!r}".format(line))
    print(80*"-")

Output:

Reading test_dos.txt
--> 'hello\r\n'
--> 'world\r\n'
--------------------------------------------------------------------------------
Reading test_nix.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------
Reading test_mac.txt
--> 'hello\rworld\r'
--------------------------------------------------------------------------------

As you can clearly see, Python splits on the \\n character, but does not remove it from the output. This is why the "mac" example only has one line.

If you have to deal with files coming from heterogenous sources, consider activating the "universal newlines" supprt with the U flag to open .

Here's an example. Note that the only thing which changed is the U parameter to open :

files = ('test_dos.txt', 'test_nix.txt', 'test_mac.txt')

for fname in files:
    print("Reading {}".format(fname))
    with open(fname, 'U') as fptr:
        for line in fptr:
            print("--> {!r}".format(line))
    print(80*"-")

Output:

Reading test_dos.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------
Reading test_nix.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------
Reading test_mac.txt
--> 'hello\n'
--> 'world\n'
--------------------------------------------------------------------------------

As you can see, not all files behave identically. This might prompt you to pepper in U everywhere where you are reading text files. However, I am certain that there is a good reason why it's not the default! :)

There surely is no bug in the readline() routine; too many people use it too regularly, and unless you have a very strange implementation which is not the standard Python, you will be using a decent version as well.

The information you provided yet is not enough to be sure what the reasons for your issue are but there are some analysis methods I would propose to find out what you are dealing with.

You should have a closer look at what there is in your lines, which bytes terminate your lines ( '\\n' or '\\r\\n' or whatever) and have an especially close look at the line at+gpsinit=2 and its end.

On a Unix system you can use od (or xxd ) for this. With option -c the characters are printed. Use -t x1 -tc to also get hex output for each byte.

Ok so I solved my problem, and it seems that Np give me the wrong text file. Anyway I used this command :

file = open("testtxt.txt", 'r', newline="\r\n")

And it gave me the good lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM