简体   繁体   English

验证STL文件是ASCII还是二进制文件

[英]Verifying that an STL file is ASCII or binary

After reading the specs on the STL file format, I want to write a few tests to ensure that a file is, in fact, a valid binary or ASCII file. 在阅读了STL文件格式的规范之后,我想编写一些测试来确保文件实际上是有效的二进制文件或ASCII文件。

An ASCII-based STL file can be determined by finding the text " solid " at byte 0, followed by a space (hex value \\x20 ), and then an optional text string, followed by a newline. 可以通过在字节0处找到文本“ solid ”,然后是空格(十六进制值\\x20 ),然后是可选的文本字符串,后跟换行符来确定基于ASCII的STL文件。

A binary STL file has a reserved 80 -byte header, followed by a 4 -byte unsigned integer ( NumberOfTriangles ), and then 50 bytes of data for each of the NumberOfTriangles facets specified. 二进制STL文件具有保留的80字节头,后跟4字节无符号整数( NumberOfTriangles ),然后指定每个NumberOfTriangles构面的50字节数据。

Each triangle facet is 50 bytes in length: 12 single-precision (4-byte) floats followed by an unsigned short (2-byte) unsigned integer. 每个三角形面的长度为50个字节:12个单精度(4字节)浮点数,后跟无符号短(2字节)无符号整数。

If a binary file is exactly 84 + NumberOfTriangles*50 bytes long, it can be typically be considered to be a valid binary file. 如果二进制文件正好是84 + NumberOfTriangles * 50字节长,则通常可以将其视为有效的二进制文件。

Unfortunately, binary files can contain the text " solid " starting at byte 0 in the contents of the 80-byte header. 不幸的是,二进制文件可以在80字节头的内容中包含从字节0开始的文本“ solid ”。 Therefore, a test for only that keyword cannot positively rule that a file is ASCII or binary. 因此,仅对该关键字进行测试不能正确地确定文件是ASCII还是二进制。

This is what I have so far: 这是我到目前为止:

STL_STATUS getStlFileFormat(const QString &path)
{
    // Each facet contains:
    //  - Normals: 3 floats (4 bytes)
    //  - Vertices: 3x floats (4 bytes each, 12 bytes total)
    //  - AttributeCount: 1 short (2 bytes)
    // Total: 50 bytes per facet
    const size_t facetSize = 3*sizeof(float_t) + 3*3*sizeof(float_t) + sizeof(uint16_t);

    QFile file(path);
    if (!file.open(QIODevice::ReadOnly))
    {
        qDebug("\n\tUnable to open \"%s\"", qPrintable(path));
        return STL_INVALID;
    }

    QFileInfo fileInfo(path);
    size_t fileSize = fileInfo.size();

    if (fileSize < 84)
    {
        // 80-byte header + 4-byte "number of triangles" marker
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        return STL_INVALID;
    }

    // Look for text "solid" in first 5 bytes, indicating the possibility that this is an ASCII STL format.
    QByteArray fiveBytes = file.read(5);

    // Header is from bytes 0-79; numTriangleBytes starts at byte offset 80.
    if (!file.seek(80))
    {
        qDebug("\n\tCannot seek to the 80th byte (after the header)");
        return STL_INVALID;
    }

    // Read the number of triangles, uint32_t (4 bytes), little-endian
    QByteArray nTrianglesBytes = file.read(4);
    file.close();

    uint32_t nTriangles = *((uint32_t*)nTrianglesBytes.data());

    // Verify that file size equals the sum of header + nTriangles value + all triangles
    size_t targetSize = 84 + nTriangles * facetSize;
    if (fileSize == targetSize)
    {
        return STL_BINARY;
    }
    else if (fiveBytes.contains("solid"))
    {
        return STL_ASCII;
    }
    else
    {
        return STL_INVALID;
    }
}

So far, this has worked for me, but I'm worried that a plain ASCII file's 80th byte could contain some ASCII characters that, when translated to a uint32_t, could actually equal the length of the file (very unlikely, but not impossible). 到目前为止,这对我有用,但我担心普通的ASCII文件的第80个字节可能包含一些ASCII字符,当转换为uint32_t时,实际上可能等于文件的长度(非常不可能,但并非不可能) 。

Are there additional steps that would prove useful in validating whether I can be "absolutely sure" that a file is either ASCII or binary? 是否有其他步骤可以证明我是否可以“绝对确定”文件是ASCII还是二进制?

UPDATE: 更新:

Following the advice of @Powerswitch and @RemyLebeau, I'm doing further tests for keywords. 根据@Powerswitch和@RemyLebeau的建议,我正在进一步测试关键字。 This is what I've got now: 这就是我现在所拥有的:

STL_STATUS getStlFileFormat(const QString &path)
{
    // Each facet contains:
    //  - Normals: 3 floats (4 bytes)
    //  - Vertices: 3x floats (4 byte each, 12 bytes total)
    //  - AttributeCount: 1 short (2 bytes)
    // Total: 50 bytes per facet
    const size_t facetSize = 3*sizeof(float_t) + 3*3*sizeof(float_t) + sizeof(uint16_t);

    QFile file(path);
    bool canFileBeOpened = file.open(QIODevice::ReadOnly);
    if (!canFileBeOpened)
    {
        qDebug("\n\tUnable to open \"%s\"", qPrintable(path));
        return STL_INVALID;
    }

    QFileInfo fileInfo(path);
    size_t fileSize = fileInfo.size();

    // The minimum size of an empty ASCII file is 15 bytes.
    if (fileSize < 15)
    {
        // "solid " and "endsolid " markers for an ASCII file
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        file.close();
        return STL_INVALID;
    }

    // Binary files should never start with "solid ", but just in case, check for ASCII, and if not valid
    // then check for binary...

    // Look for text "solid " in first 6 bytes, indicating the possibility that this is an ASCII STL format.
    QByteArray sixBytes = file.read(6);
    if (sixBytes.startsWith("solid "))
    {
        QString line;
        QTextStream in(&file);
        while (!in.atEnd())
        {
            line = in.readLine();
            if (line.contains("endsolid"))
            {
                file.close();
                return STL_ASCII;
            }
        }
    }

    // Wasn't an ASCII file. Reset and check for binary.
    if (!file.reset())
    {
        qDebug("\n\tCannot seek to the 0th byte (before the header)");
        file.close();
        return STL_INVALID;
    }

    // 80-byte header + 4-byte "number of triangles" for a binary file
    if (fileSize < 84)
    {
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        file.close();
        return STL_INVALID;
    }

    // Header is from bytes 0-79; numTriangleBytes starts at byte offset 80.
    if (!file.seek(80))
    {
        qDebug("\n\tCannot seek to the 80th byte (after the header)");
        file.close();
        return STL_INVALID;
    }

    // Read the number of triangles, uint32_t (4 bytes), little-endian
    QByteArray nTrianglesBytes = file.read(4);
    if (nTrianglesBytes.size() != 4)
    {
        qDebug("\n\tCannot read the number of triangles (after the header)");
        file.close();
        return STL_INVALID;
    }

    uint32_t nTriangles = *((uint32_t*)nTrianglesBytes.data());

    // Verify that file size equals the sum of header + nTriangles value + all triangles
    if (fileSize == (84 + (nTriangles * facetSize)))
    {
        file.close();
        return STL_BINARY;
    }

    return STL_INVALID;
}

It appears to handle more edge cases, and I've attempted to write it in a way that handles extremely large (a few gigabyte) STL files gracefully without requiring the ENTIRE file to be loaded into memory at once for it to scan for the "endsolid" text. 它似乎处理更多边缘情况,我试图以一种优雅地处理极大(几千兆字节)STL文件的方式编写它,而不需要立即将ENTIRE文件加载到内存中以便扫描“结束“文本。

Feel free to provide any feedback and suggestions (especially for people in the future looking for solutions). 随意提供任何反馈和建议(特别是对于将来寻找解决方案的人)。

If the file does not begin with "solid " , and if the file size is exactly 84 + (numTriangles * 50) bytes, where numTriangles is read from offset 80, then the file is binary. 如果文件不以"solid "开头,并且文件大小正好是84 + (numTriangles * 50)个字节,其中numTriangles从偏移量80读取,那么该文件是二进制的。

If the file size is at least 15 bytes (absolute minimum for an ASCII file with no triangles) and begins with "solid " , read the name that follows it until a line break is reached. 如果文件大小至少为15个字节(没有三角形的ASCII文件的绝对最小值)并以"solid "开头,请读取其后的名称 ,直到达到换行符。 Check if the next line either begins with "facet " or is "endsolid [name]" (no other value is allowed). 检查下一行是以"facet "开头还是"endsolid [name]" (不允许其他值)。 If "facet " , seek to the end of the file and make sure it ends with a line that says "endsolid [name]" . 如果是"facet " ,请寻找文件的末尾,并确保它以一行"endsolid [name]" If all of these are true, the file is ASCII. 如果所有这些都为真,则该文件为ASCII。

Treat any other combination as invalid. 将任何其他组合视为无效。

So, something like this: 所以,像这样:

STL_STATUS getStlFileFormat(const QString &path)
{
    QFile file(path);
    if (!file.open(QIODevice::ReadOnly))
    {
        qDebug("\n\tUnable to open \"%s\"", qPrintable(path));
        return STL_INVALID;
    }

    QFileInfo fileInfo(path);
    size_t fileSize = fileInfo.size();

    // Look for text "solid " in first 6 bytes, indicating the possibility that this is an ASCII STL format.

    if (fileSize < 15)
    {
        // "solid " and "endsolid " markers for an ASCII file
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        return STL_INVALID;
    }

    // binary files should never start with "solid ", but
    // just in case, check for ASCII, and if not valid then
    // check for binary...

    QByteArray sixBytes = file.read(6);
    if (sixBytes.startsWith("solid "))
    {
        QByteArray name = file.readLine();
        QByteArray endLine = name.prepend("endsolid ");

        QByteArray nextLine = file.readLine();
        if (line.startsWith("facet "))
        {
            // TODO: seek to the end of the file, read the last line,
            // and make sure it is "endsolid [name]"...
            /*
            line = ...;
            if (!line.startsWith(endLine))
                return STL_INVALID;
            */
            return STL_ASCII;
        }
        if (line.startsWith(endLine))
            return STL_ASCII;

        // reset and check for binary...
        if (!file.reset())
        {
            qDebug("\n\tCannot seek to the 0th byte (before the header)");
            return STL_INVALID;
        }
    }

    if (fileSize < 84)
    {
        // 80-byte header + 4-byte "number of triangles" for a binary file
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        return STL_INVALID;
    }

    // Header is from bytes 0-79; numTriangleBytes starts at byte offset 80.
    if (!file.seek(80))
    {
        qDebug("\n\tCannot seek to the 80th byte (after the header)");
        return STL_INVALID;
    }

    // Read the number of triangles, uint32_t (4 bytes), little-endian
    QByteArray nTrianglesBytes = file.read(4);
    if (nTrianglesBytes.size() != 4)
    {
        qDebug("\n\tCannot read the number of triangles (after the header)");
        return STL_INVALID;
    }            

    uint32_t nTriangles = *((uint32_t*)nTrianglesBytes.data());

    // Verify that file size equals the sum of header + nTriangles value + all triangles
    if (fileSize == (84 + (nTriangles * 50)))
        return STL_BINARY;

    return STL_INVALID;
}

Are there additional steps that would prove useful in validating whether I can be "absolutely sure" that a file is either ASCII or binary? 是否有其他步骤可以证明我是否可以“绝对确定”文件是ASCII还是二进制?

Since there is no format tag in the stl specs, you can't be absolutely sure about the file format. 由于stl规范中没有格式标记,因此您无法完全确定文件格式。

Checking for "solid" in the beginning of the file should be enough in most cases. 在大多数情况下,在文件开头检查“solid”应该足够了。 Additionally you could check for further keywords like "facet" or "vertex" to be sure it's ASCII. 此外,您可以检查更多关键字,如“facet”或“vertex”,以确保它是ASCII。 These words should only occur in the ASCII format (or in the useless binary header), but there is a little chance that the binary floats coincidentally form these words. 这些单词只应以ASCII格式(或无用的二进制标题)出现,但二进制浮点数偶然形成这些单词的可能性很小。 So you could also check if the keywords are in the right order. 因此,您还可以检查关键字的顺序是否正确。

And of course check if the length in the binary header matches the file length. 当然,检查二进制标头中的长度是否与文件长度匹配。

But: Your code would work faster if you'd read the file linear and hope that nobody puts the words "solid" in the binary header. 但是:如果您已经读取了线性文件并希望没有人在二进制标头中添加“可靠”字样,那么您的代码将更快地运行。 Maybe you should prefer ASCII-parsing if the file starts with "solid" and use the binary parser as a fallback if the ASCII parsing fails. 如果文件以“solid”开头,则可能更喜欢ASCII解析,如果ASCII解析失败,则使用二进制解析器作为后备。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM