简体   繁体   English

从 c++ 中的 Content-Disposition header 中正确提取文件名的更快方法

[英]Faster way to correctly extract file name from Content-Disposition header in c++

I want to extract all possible types of valid file name from "filename" attribute of Content-Disposition HTTP header like the following example:我想从 Content-Disposition HTTP header 的“文件名”属性中提取所有可能类型的有效文件名,如下例所示:

Content-Disposition: attachment; filename="filename.jpg"
Content-Disposition: attachment; filename=file-2020-April.txt.vbs"

Moreover, sometimes file name have non ASCII characters and in such case the correct file name comes from "filename=*" attribute like the following example:(this just an example, not an actual data)此外,有时文件名包含非 ASCII 字符,在这种情况下,正确的文件名来自 "filename=*" 属性,如下例所示:(这只是一个示例,不是实际数据)

Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt

I used the following string functions to extract only from filename="我使用以下字符串函数仅从 filename=" 中提取

string ContentDispositionHeader;
int startPos = ContentDispositionHeader.find("\"");
startPos++;
int endPos = ContentDispositionHeader.find_last_of("\"");
int length = endPos - startPos;
string filename = ContentDispositionHeader.substr(startPos, length);

However, I need to write code to manage both file naming case (normal and UTF-8).但是,我需要编写代码来管理文件命名情况(普通和 UTF-8)。 is there a faster way to extract file names easily.有没有更快的方法来轻松提取文件名。

I believe that you cannot get faster than O(n) where n = length of the header if that's what you are looking for.我相信你不能比O(n)更快,其中n = length of the header如果那是你正在寻找的。 And, this is what you're already trying to do.而且,这就是你已经在尝试做的事情。

Following is an example that extracts the filenames from the headers in a similar fashion considering that the quotes are always present (refer to RFC 6266 for more on this);以下是考虑到引号始终存在的情况下,以类似方式从标题中提取文件名的示例(有关此内容的更多信息,请参阅RFC 6266 ); and, the UTF-8 format always follows the ASCII one if the latter is present.并且,如果后者存在,则 UTF-8 格式始终遵循 ASCII 格式。 Moreover, there might be more cases that you need to take care of while parsing the header.此外,在解析 header 时,您可能需要注意更多情况。

Here's the example ( live ):这是示例(现场):

#include <iostream>
#include <string>
#include <vector>
#include <utility>

// Filenames: <ASCII, UTF-8>
using Filenames = std::pair<std::string, std::string>;

Filenames getFilename( const std::string& header )
{
    std::string ascii;

    const std::string q1 { R"(filename=")" };
    if ( const auto pos = header.find(q1); pos != std::string::npos )
    {
        const auto len = pos + q1.size();

        const std::string q2 { R"(")" };
        if ( const auto pos = header.find(q2, len); pos != std::string::npos )
        {
            ascii = header.substr(len, pos - len);
        }
    }

    std::string utf8;

    const std::string u { R"(UTF-8'')" };
    if ( const auto pos = header.find(u); pos != std::string::npos )
    {
        utf8 = header.substr(pos + u.size());
    }

    return { ascii, utf8 };
}

int main()
{
    const std::vector<std::string> headers
    {
        R"(Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt)",
        R"(Content-Disposition: attachment; filename*=UTF-8''日本.txt)",
        R"(Content-Disposition: attachment; filename="filename.jpg")",
        R"(Content-Disposition: attachment; filename="file-2020-April.txt.vbs")"
    };

    for ( const auto& header : headers )
    {
        const auto& [ascii, utf8] = getFilename( header );
        std::cout << header
                  << "\n\tASCII: " << ascii
                  << "\n\tUTF-8: " << utf8 << '\n';
    }

    return 0;
}

Output: Output:

Content-Disposition: attachment; filename="??.txt"; filename*=UTF-8''日本.txt
    ASCII: ??.txt
    UTF-8: 日本.txt
Content-Disposition: attachment; filename*=UTF-8''日本.txt
    ASCII: 
    UTF-8: 日本.txt
Content-Disposition: attachment; filename="filename.jpg"
    ASCII: filename.jpg
    UTF-8: 
Content-Disposition: attachment; filename="file-2020-April.txt.vbs"
    ASCII: file-2020-April.txt.vbs
    UTF-8: 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM