读取 PCD 文件时出现无效起始字节错误

Question

I'm trying to extract data from a face dataset I found online which provides png pictures and their corresponding pcd files.我正在尝试从我在网上找到的人脸数据集中提取数据，该数据集提供 png 图片及其相应的 pcd 文件。 However, whenever I try to extract data from the pcd files I get the error:但是，每当我尝试从 pcd 文件中提取数据时，我都会收到错误消息：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 202: invalid start byte

I understand that this is because I'm trying to read a non-ASCII character, however, I haven't seen any people run into this problem when opening an outside source's.pcd files.我知道这是因为我正在尝试读取非 ASCII 字符，但是，我没有看到任何人在打开外部源的 .pcd 文件时遇到这个问题。 Is there an error on the end of the dataset, or is there a workaround that will let me read this file.数据集末尾是否有错误，或者是否有解决方法可以让我读取此文件。 I eventually want to work towards a depth image for machine learning applications (I'm fairly new to machine learning in general).我最终想为机器学习应用程序开发深度图像（我对机器学习一般来说还很陌生）。

If this is a problem with the dataset, I'd love to hear about other RGB-D face datasets, as I haven't been able to find any others that provide depth information.如果这是数据集的问题，我很想听听其他 RGB-D 人脸数据集，因为我无法找到任何其他提供深度信息的数据集。

If this is my problem, I'd like to know what I can do to fix it, because I have tried a number of different techniques and libraries to read the files and have only gotten this error.如果这是我的问题，我想知道我能做些什么来解决它，因为我尝试了许多不同的技术和库来读取文件并且只得到了这个错误。

Thanks!谢谢！

import os
import math
import numpy as np
from PIL import Image


filePath = "001_01_cloud.pcd"

with open(filePath, "r") as pcd_file:
    lines = [line.strip().split(" ") for line in pcd_file.readlines()]

Answer 1

Googling for the specification for the PCD format says that the actual point cloud data could be stored in binary form, and I'm assuming that's what's going on here:谷歌搜索PCD 格式的规范说实际的点云数据可以以二进制形式存储，我假设这就是这里发生的事情：

DATA - specifies the data type that the point cloud data is stored in. As of version 0.7, two data types are supported: ascii and binary. DATA - 指定存储点云数据的数据类型。从 0.7 版本开始，支持两种数据类型：ascii 和二进制。

Since you're opening the file with mode "r" , Python will assume it's text, and will handily attempt to interpret everything as UTF-8 (by default; you can pass encoding="..." ).由于您使用模式"r"打开文件，因此 Python 将假定它是文本，并会轻松尝试将所有内容解释为 UTF-8 （默认情况下；您可以传递encoding="..." ）。

However since the format has a text header followed by text or binary data, you will need to open it in binary mode, "rb" .但是，由于该格式有一个文本 header 后跟文本或二进制数据，因此您需要以二进制模式打开它， "rb" 。 (This means reads from the file will yield bytes objects, not strings.) You can then .decode() bytes objects into strings if you need to handle them as text. （这意味着从文件中读取将产生bytes对象，而不是字符串。）然后，如果您需要将字节对象作为文本处理，则可以将.decode()字节对象转换为字符串。

You also shouldn't use .readlines() with a file like this;您也不应该将.readlines()与这样的文件一起使用； the binary data that follows the textual headers can contain \n characters, and that data would be "broken" if split into lines.文本标题后面的二进制数据可以包含\n字符，如果分成几行，该数据将被“破坏”。

Anyway, you may be reinventing the wheel here;无论如何，您可能在这里重新发明轮子； there seems to be a Python library for PCD files . PCD 文件似乎有一个 Python 库。

读取 PCD 文件时出现无效起始字节错误

问题描述

1 个解决方案

解决方案1
0 2020-07-29 19:31:15

读取 PCD 文件时出现无效起始字节错误

问题描述

1 个解决方案

解决方案1 0 2020-07-29 19:31:15

解决方案1
0 2020-07-29 19:31:15