简体   繁体   English

读取 PCD 文件时出现无效起始字节错误

[英]invalid start byte error while reading PCD files

I'm trying to extract data from a face dataset I found online which provides png pictures and their corresponding pcd files.我正在尝试从我在网上找到的人脸数据集中提取数据,该数据集提供 png 图片及其相应的 pcd 文件。 However, whenever I try to extract data from the pcd files I get the error:但是,每当我尝试从 pcd 文件中提取数据时,我都会收到错误消息:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 202: invalid start byte

I understand that this is because I'm trying to read a non-ASCII character, however, I haven't seen any people run into this problem when opening an outside source's.pcd files.我知道这是因为我正在尝试读取非 ASCII 字符,但是,我没有看到任何人在打开外部源的 .pcd 文件时遇到这个问题。 Is there an error on the end of the dataset, or is there a workaround that will let me read this file.数据集末尾是否有错误,或者是否有解决方法可以让我读取此文件。 I eventually want to work towards a depth image for machine learning applications (I'm fairly new to machine learning in general).我最终想为机器学习应用程序开发深度图像(我对机器学习一般来说还很陌生)。

If this is a problem with the dataset, I'd love to hear about other RGB-D face datasets, as I haven't been able to find any others that provide depth information.如果这是数据集的问题,我很想听听其他 RGB-D 人脸数据集,因为我无法找到任何其他提供深度信息的数据集。

If this is my problem, I'd like to know what I can do to fix it, because I have tried a number of different techniques and libraries to read the files and have only gotten this error.如果这是我的问题,我想知道我能做些什么来解决它,因为我尝试了许多不同的技术和库来读取文件并且只得到了这个错误。

Thanks!谢谢!

import os
import math
import numpy as np
from PIL import Image


filePath = "001_01_cloud.pcd"

with open(filePath, "r") as pcd_file:
    lines = [line.strip().split(" ") for line in pcd_file.readlines()]

Googling for the specification for the PCD format says that the actual point cloud data could be stored in binary form, and I'm assuming that's what's going on here:谷歌搜索PCD 格式的规范说实际的点云数据可以以二进制形式存储,我假设这就是这里发生的事情:

DATA - specifies the data type that the point cloud data is stored in. As of version 0.7, two data types are supported: ascii and binary. DATA - 指定存储点云数据的数据类型。从 0.7 版本开始,支持两种数据类型:ascii 和二进制。

Since you're opening the file with mode "r" , Python will assume it's text, and will handily attempt to interpret everything as UTF-8 (by default; you can pass encoding="..." ).由于您使用模式"r"打开文件,因此 Python 将假定它是文本,并会轻松尝试将所有内容解释为 UTF-8 (默认情况下;您可以传递encoding="..." )。

However since the format has a text header followed by text or binary data, you will need to open it in binary mode, "rb" .但是,由于该格式有一个文本 header 后跟文本或二进制数据,因此您需要以二进制模式打开它, "rb" (This means reads from the file will yield bytes objects, not strings.) You can then .decode() bytes objects into strings if you need to handle them as text. (这意味着从文件中读取将产生bytes对象,而不是字符串。)然后,如果您需要将字节对象作为文本处理,则可以将.decode()字节对象转换为字符串。

You also shouldn't use .readlines() with a file like this;您也不应该将.readlines()与这样的文件一起使用; the binary data that follows the textual headers can contain \n characters, and that data would be "broken" if split into lines.文本标题后面的二进制数据可以包含\n字符,如果分成几行,该数据将被“破坏”。

Anyway, you may be reinventing the wheel here;无论如何,您可能在这里重新发明轮子; there seems to be a Python library for PCD files . PCD 文件似乎有一个 Python 库

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 'utf-8' 编解码器无法解码 position 中的字节 0x80 3131:无效的起始字节':在读取 xml 文件时 - 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte': while reading xml files UnicodeDecodeError:'utf-8'编解码器无法解码 position 0 中的字节 0xff:读取 csv 时 python 中的无效起始字节错误 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte error in python while reading a csv file 在 Python 中读取多个文件时无效的起始字节 - Invalid start byte when reading multiple files in Python UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 1 的字节 0x8b:无效的起始字节,同时在 Pandas 中读取 csv 文件 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte, while reading csv file in pandas UnicodeDecodeError:无效的起始字节 - UnicodeDecodeError: invalid start byte 读取一堆.gz文件错误:解压缩时错误-3:设置的代码长度无效 - Reading through a bunch of .gz files error: Error -3 while decompressing: invalid code lengths set Python序列化对象并解码返回无效的起始字节错误 - Python serialize object and decode return an invalid start byte error 打开CSV文件时出现严重的“无效的起始字节” Unicode错误 - Terrible “invalid start byte” Unicode Error with Opening a CSV file 转换汉字时Java桥代码错误:'utf-8'编解码器无法解码位置0的字节0xc0:无效的起始字节 - Java bridge code error while converting chinese characters : 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte 为什么 read_csv 因无效起始字节错误而失败? - Why is read_csv failing with invalid start byte error?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM