Encode ser.readline() as UTF-8

Question

I have a Neo 6M GPS module that I am trying to print coordinates from. It is currently printing NMEA sentences in byte form with \r\n stuck to the end. Here is an example:

b'$GPGGA,161812.371,4042.759,N,07400.317,W,1,12,1.0,0.0,M,0.0,M,,*7B\r\n'

To parse the string into coordinates, I need to get rid of the \r , \n and b' '

To do this, I am trying.strip("b'rn\\"). Turns out you can only strip strings, not bytes. To overcome the incompatibility of the bytes and strip, I tried to decode the bytes as a string like this: (ser.readline().decode("utf-8")).strip("b'rn\\")

This doesn't run and I get this error:

Traceback (most recent call last):
  File "gps2.py", line 10, in <module>
    newdata = (ser.readline().decode("utf-8")).strip("b'rn\\")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 0: invalid start byte

Below is my code. Is anyone able to help me either decode and strip it, or get rid of the \r , \n and b' ' another way?

import serial
import time
import string
import pynmea2

while True:
    port = "/dev/ttyAMA0"
    ser = serial.Serial(port,baudrate=9600,timeout=0.5)
    dataout = pynmea2.NMEAStreamReader()
    newdata = (ser.readline().decode("utf-8")).strip("b'rn\\")

    if newdata[0:6] == "$GPRMC":
        newmsg = pynmea2.parse(newdata)
        lat = newmsg.latitude
        lng = newmsg.longitude
        gps = ("Latitude = " + str(lat) + " and Longitude = " +str(lng))
        print(gps)
    elif newdata[0:6] == "$GPGLL":
        print("Found GPGLL record: " + newdata)
    else:
        print(newdata)

Answer 1

Note: I changed my original comment to an answer when it grew longer than a comment in response to OP's amplification of the original question.

You can't get rid of the b' ' . It isn't in the data. It is a Python convention that shows you your data is a bytestring and not a regular string. A call to decode() will turn the bytestring into a string. The \r\n , on the other hand, is in the data. It shows that your device is terminating the string with a carriage-return/linefeed pair. Both of those count as whitespace. The character 0xfe at the beginning is the first part of a byte order mark pair \xfe\xff and can be discarded. So all you should need is ser.readline()[2:].decode("utf-8").strip() .

As for the uninterpretable data you did not mention in your question, but only in a subsequent comment:

With neither the device nor its documentation I can do little more than speculate on the apparently binary data you are getting prefixed to the data you want. It certainly isn't character data of any sort I can identify: it's not UTF-8 and it's not valid UTF-16, and my hunch is that it isn't an East Asian MBCS either. And it is unlikely to be floats or ints because there isn't a single zero byte, and binary numeric data (and UTF-32) tends to have a lot of those.

But if the data you want starts with a known constant like $GPGGA, then it should not be very difficult to pick what you want out of the stream you get. For example, suppose you get

b'i\x9a\xcab\x82\xbab\x8a\xb2b\x92\xc2b\x92\xca\x9ab\x8a\xa2R\xba\xc2jR":A\x1dMY\xb1\xcd\xb1\xc9\xb1\xc5\xc1\xb1\xc5\xe1\xb1\xd1\xd9\xb1\xc5\xd5\xdd\xb1\xc9\xc1\xb1\xc9\xd5\xb1\xc9\xd5\xb1\xc5\xc5\xd9\xb1\xc5\xd1\xb1\xc9\xd9\xb1\xd9\xc5\xb1\xc9\xe5\xc9\xb1\xc5\xd1\xb1\xc9\xdd\xb1\xc1\xc9\xb1\xc9\xd1\xdd\xb1\xc1\xd9\xa9\xdd\x195)\x91\x1dA\x1dMY\xb1\xcd\xb1\xcd\xb1\xc5\xc1\xb1\xc9\xe5\xb1\xd5\xd9\xb1\xc1\xd9\xcd\xb1\xc9\xd1\xb1\xcd\xc5\xb1\xd1\xe5\xb1\xc9\xc1\xe5\xb1\xc5\xd5\xa9\xdd\xcd5)\x91\x1dA\x1d11\xb1\xd5\xc5\xc9\xd5\xb9\xe5\xe5\xc1\xc5\xe1\xb19\xb1\xc1\xc1\xc1\xc9\xd5\xb9\xd5\xe1\xd1\xc1\xcd\xb1]\xb1\xc9\xc1\xc1\xdd\xcd\xd9\xb9\xc1\xc1\xb1\x05\xb1\x05\xa9\xdd\r5)\xff\xfe\xff$GPGGA,161812.371,4042.759,N,07400.317,W,1,12,1.0,0.0,M,0.0,M,,*7B\r\n'

(most of which is copied from your Pastebin stuff) and you store this in dataout . Then dataout.partition(b'$GPGGA,')[-1].decode().strip() will give you the numbers you expect, whether there is uninterpretable binary data to the left of $GPGGA, or not.

In your shoes I would still want to know what that binary data is. I think it is more likely to be caused by the intricacies of serial data transmission than any defect in the device. My guess is that it is real data, but maybe with unexpected data bits (which pySerial calls bytesize ), stop bits, or parity. Your call to serial.Serial() takes the default values of 8 data bits, no parity, one stop bit. I don't know how clever the serial module is, but it may be that it can recover from incorrect initial values after seeing some of the data. Modems could do that 25 years ago by looking at the (admittedly, prespecified) first 2 bytes of the data.

Encode ser.readline() as UTF-8

Question

1 answers

solution1
1 ACCPTED 2021-01-08 21:40:32

Encode ser.readline() as UTF-8

Question

1 answers

solution1 1 ACCPTED 2021-01-08 21:40:32

solution1
1 ACCPTED 2021-01-08 21:40:32