将 ser.readline() 编码为 UTF-8

Question

I have a Neo 6M GPS module that I am trying to print coordinates from.我有一个 Neo 6M GPS 模块，我正在尝试从中打印坐标。 It is currently printing NMEA sentences in byte form with \r\n stuck to the end.它目前正在以字节形式打印 NMEA 句子，其中\r\n粘在末尾。 Here is an example:这是一个例子：

b'$GPGGA,161812.371,4042.759,N,07400.317,W,1,12,1.0,0.0,M,0.0,M,,*7B\r\n'

To parse the string into coordinates, I need to get rid of the \r , \n and b' '要将字符串解析为坐标，我需要去掉\r 、 \n和b' '

To do this, I am trying.strip("b'rn\\").为此，我正在尝试.strip("b'rn\\")。 Turns out you can only strip strings, not bytes.原来你只能剥离字符串，而不是字节。 To overcome the incompatibility of the bytes and strip, I tried to decode the bytes as a string like this: (ser.readline().decode("utf-8")).strip("b'rn\\")为了克服字节和条带的不兼容问题，我尝试将字节解码为这样的字符串： (ser.readline().decode("utf-8")).strip("b'rn\\")

This doesn't run and I get this error:这不运行，我得到这个错误：

Traceback (most recent call last):
  File "gps2.py", line 10, in <module>
    newdata = (ser.readline().decode("utf-8")).strip("b'rn\\")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 0: invalid start byte

Below is my code.下面是我的代码。 Is anyone able to help me either decode and strip it, or get rid of the \r , \n and b' ' another way?有没有人可以帮助我解码和剥离它，或者以另一种方式摆脱\r ， \n和b' ' ？

import serial
import time
import string
import pynmea2

while True:
    port = "/dev/ttyAMA0"
    ser = serial.Serial(port,baudrate=9600,timeout=0.5)
    dataout = pynmea2.NMEAStreamReader()
    newdata = (ser.readline().decode("utf-8")).strip("b'rn\\")

    if newdata[0:6] == "$GPRMC":
        newmsg = pynmea2.parse(newdata)
        lat = newmsg.latitude
        lng = newmsg.longitude
        gps = ("Latitude = " + str(lat) + " and Longitude = " +str(lng))
        print(gps)
    elif newdata[0:6] == "$GPGLL":
        print("Found GPGLL record: " + newdata)
    else:
        print(newdata)

Answer 1

Note: I changed my original comment to an answer when it grew longer than a comment in response to OP's amplification of the original question.注意：我将原始评论更改为答案，因为它比评论更长，以响应 OP 对原始问题的放大。

You can't get rid of the b' ' .你无法摆脱b' ' 。 It isn't in the data.它不在数据中。 It is a Python convention that shows you your data is a bytestring and not a regular string.这是一个 Python 约定，显示您的数据是字节字符串而不是常规字符串。 A call to decode() will turn the bytestring into a string.调用decode()会将字节串转换为字符串。 The \r\n , on the other hand, is in the data.另一方面， \r\n位于数据中。 It shows that your device is terminating the string with a carriage-return/linefeed pair.它表明您的设备正在使用回车/换行对终止字符串。 Both of those count as whitespace.这两个都算作空白。 The character 0xfe at the beginning is the first part of a byte order mark pair \xfe\xff and can be discarded.开头的字符0xfe是字节顺序标记对\xfe\xff的第一部分，可以丢弃。 So all you should need is ser.readline()[2:].decode("utf-8").strip() .所以你只需要ser.readline()[2:].decode("utf-8").strip() 。

As for the uninterpretable data you did not mention in your question, but only in a subsequent comment:至于您在问题中未提及的无法解释的数据，但仅在随后的评论中提及：

With neither the device nor its documentation I can do little more than speculate on the apparently binary data you are getting prefixed to the data you want.既没有设备也没有它的文档，我只能推测显然是二进制数据，你得到的数据是你想要的数据的前缀。 It certainly isn't character data of any sort I can identify: it's not UTF-8 and it's not valid UTF-16, and my hunch is that it isn't an East Asian MBCS either.它当然不是我能识别的任何类型的字符数据：它不是 UTF-8 也不是有效的 UTF-16，而且我的直觉是它也不是东亚 MBCS。 And it is unlikely to be floats or ints because there isn't a single zero byte, and binary numeric data (and UTF-32) tends to have a lot of those.而且它不太可能是浮点数或整数，因为没有一个零字节，而二进制数字数据（和 UTF-32）往往有很多这样的。

But if the data you want starts with a known constant like $GPGGA, then it should not be very difficult to pick what you want out of the stream you get.但是，如果您想要的数据以$GPGGA,那么从您获得的 stream 中挑选出您想要的数据应该不是很困难。 For example, suppose you get例如，假设你得到

b'i\x9a\xcab\x82\xbab\x8a\xb2b\x92\xc2b\x92\xca\x9ab\x8a\xa2R\xba\xc2jR":A\x1dMY\xb1\xcd\xb1\xc9\xb1\xc5\xc1\xb1\xc5\xe1\xb1\xd1\xd9\xb1\xc5\xd5\xdd\xb1\xc9\xc1\xb1\xc9\xd5\xb1\xc9\xd5\xb1\xc5\xc5\xd9\xb1\xc5\xd1\xb1\xc9\xd9\xb1\xd9\xc5\xb1\xc9\xe5\xc9\xb1\xc5\xd1\xb1\xc9\xdd\xb1\xc1\xc9\xb1\xc9\xd1\xdd\xb1\xc1\xd9\xa9\xdd\x195)\x91\x1dA\x1dMY\xb1\xcd\xb1\xcd\xb1\xc5\xc1\xb1\xc9\xe5\xb1\xd5\xd9\xb1\xc1\xd9\xcd\xb1\xc9\xd1\xb1\xcd\xc5\xb1\xd1\xe5\xb1\xc9\xc1\xe5\xb1\xc5\xd5\xa9\xdd\xcd5)\x91\x1dA\x1d11\xb1\xd5\xc5\xc9\xd5\xb9\xe5\xe5\xc1\xc5\xe1\xb19\xb1\xc1\xc1\xc1\xc9\xd5\xb9\xd5\xe1\xd1\xc1\xcd\xb1]\xb1\xc9\xc1\xc1\xdd\xcd\xd9\xb9\xc1\xc1\xb1\x05\xb1\x05\xa9\xdd\r5)\xff\xfe\xff$GPGGA,161812.371,4042.759,N,07400.317,W,1,12,1.0,0.0,M,0.0,M,,*7B\r\n'

(most of which is copied from your Pastebin stuff) and you store this in dataout . （其中大部分是从您的 Pastebin 资料中复制的）并将其存储在dataout中。 Then dataout.partition(b'$GPGGA,')[-1].decode().strip() will give you the numbers you expect, whether there is uninterpretable binary data to the left of $GPGGA, or not.然后dataout.partition(b'$GPGGA,')[-1].decode().strip()将为您提供您期望的数字，无论$GPGGA,左侧是否存在无法解释的二进制数据。

In your shoes I would still want to know what that binary data is.在你的鞋子里，我仍然想知道二进制数据是什么。 I think it is more likely to be caused by the intricacies of serial data transmission than any defect in the device.我认为这更有可能是由串行数据传输的复杂性引起的，而不是设备中的任何缺陷。 My guess is that it is real data, but maybe with unexpected data bits (which pySerial calls bytesize ), stop bits, or parity.我的猜测是它是真实数据，但可能带有意外的数据位（ pySerial调用bytesize ）、停止位或奇偶校验。 Your call to serial.Serial() takes the default values of 8 data bits, no parity, one stop bit.您对serial.Serial()的调用采用 8 个数据位、无奇偶校验、一个停止位的默认值。 I don't know how clever the serial module is, but it may be that it can recover from incorrect initial values after seeing some of the data.我不知道serial模块有多聪明，但可能是看到一些数据后，它可以从错误的初始值中恢复过来。 Modems could do that 25 years ago by looking at the (admittedly, prespecified) first 2 bytes of the data.调制解调器可以在 25 年前通过查看（诚然，预先指定的）数据的前 2 个字节来做到这一点。

将 ser.readline() 编码为 UTF-8

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-08 21:40:32

将 ser.readline() 编码为 UTF-8

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-08 21:40:32

解决方案1
1 已采纳 2021-01-08 21:40:32