如何將二進制文件讀入無符號字符向量

Question

最近我被要求編寫一個函數，將二進制文件讀入std::vector<BYTE> ，其中BYTE是一個unsigned char 。 很快我就得到了這樣的東西：

#include <fstream>
#include <vector>
typedef unsigned char BYTE;

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::streampos fileSize;
    std::ifstream file(filename, std::ios::binary);

    // get its size:
    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // read the data:
    std::vector<BYTE> fileData(fileSize);
    file.read((char*) &fileData[0], fileSize);
    return fileData;
}

這似乎不必要地復雜，並且我在調用file.read時被迫使用的顯式轉換為char*並沒有讓我感覺更好。

另一種選擇是使用std::istreambuf_iterator ：

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::ifstream file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<char>(file)),
                              std::istreambuf_iterator<char>());
}

這非常簡單和簡短，但即使我正在讀入std::vector<unsigned char> ，我仍然必須使用std::istreambuf_iterator<char> std::vector<unsigned char> 。

最后一個似乎非常簡單的選項是使用std::basic_ifstream<BYTE> ，這有點明確表示“我想要一個輸入文件流，我想用它來讀取BYTE s” ：

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::basic_ifstream<BYTE> file(filename, std::ios::binary);

    // read the data:
    return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
                              std::istreambuf_iterator<BYTE>());
}

但我不確定在這種情況下basic_ifstream是否是合適的選擇。

將二進制文件讀入vector的最佳方法是什么？ 我還想知道“幕后”發生了什么以及我可能遇到的可能問題是什么（除了流沒有被正確打開，這可以通過簡單的is_open檢查來避免）。

有什么好的理由讓人們更喜歡在這里使用std::istreambuf_iterator嗎？
（我能看到的唯一優點是簡單）

Answer 1

在測試性能時，我會包含一個測試用例：

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::ifstream file(filename, std::ios::binary);

    // Stop eating new lines in binary mode!!!
    file.unsetf(std::ios::skipws);

    // get its size:
    std::streampos fileSize;

    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // reserve capacity
    std::vector<BYTE> vec;
    vec.reserve(fileSize);

    // read the data:
    vec.insert(vec.begin(),
               std::istream_iterator<BYTE>(file),
               std::istream_iterator<BYTE>());

    return vec;
}

我的想法是方法1的構造函數接觸vector的元素，然后read再次觸及每個元素。

方法2和方法3看起來最有希望，但可能遭受一個或多個resize 。 因此在閱讀或插入之前reserve的原因。

我也會測試std::copy ：

...
std::vector<byte> vec;
vec.reserve(fileSize);

std::copy(std::istream_iterator<BYTE>(file),
          std::istream_iterator<BYTE>(),
          std::back_inserter(vec));

最后，我認為最好的解決方案將避免operator >>來自istream_iterator （以及來自operator >>所有開銷和優點）試圖解釋二進制數據）。 但我不知道如何使用它可以直接將數據復制到矢量中。

最后，我使用二進制數據進行的測試顯示ios::binary沒有得到尊重。 因此，來自<iomanip> noskipws的原因。

Answer 2

std::ifstream stream("mona-lisa.raw", std::ios::in | std::ios::binary);
std::vector<uint8_t> contents((std::istreambuf_iterator<char>(stream)), std::istreambuf_iterator<char>());

for(auto i: contents) {
    int value = i;
    std::cout << "data: " << value << std::endl;
}

std::cout << "file size: " << contents.size() << std::endl;

Answer 3

由於您要將整個文件加載到內存中，因此最佳版本是將文件映射到內存中。 這是因為內核無論如何都要將文件加載到內核頁面緩存中，並通過映射文件將緩存中的那些頁面暴露到您的進程中。 也稱為零拷貝。

當你使用std::vector<>它會將數據從內核頁面緩存復制到std::vector<> ，當你只想讀取文件時這是不必要的。

此外，當將兩個輸入迭代器傳遞給std::vector<>它會在讀取時增大其緩沖區，因為它不知道文件大小。 當首先將std::vector<>大小調整為文件大小時，它會不必要地將其內容清零，因為無論如何它都會被文件數據覆蓋。 這兩種方法在空間和時間方面都是次優的。

Answer 4

我原以為第一種方法，使用大小並使用stream::read()將是最有效的。 鑄造到char *的“成本”很可能是零 - 這種類型的演員只是告訴編譯器“嘿，我知道你認為這是一個不同的類型，但我真的想要這種類型......”，並且不添加任何額外的指令 - 如果您想確認這一點，請嘗試將文件讀入char數組，並比較實際的匯編代碼。 除了一些額外的工作來計算向量內的緩沖區的地址，應該沒有任何區別。

與往常一樣，唯一可以確保在您的情況下最有效的方法是測量它。 “在互聯網上詢問”並不是證明。

Answer 5

下面的類通過二進制文件加載和保存擴展了向量。 我已經多次返回這個問題，所以這是我下一次返回的代碼 - 以及接下來將尋找二進制文件保存方法的所有其他人。 :)

#include <cinttypes>
#include <fstream>
#include <vector>

class FileVector : public std::vector<uint8_t>
{
    public:

        using std::vector<uint8_t>::vector;

        void loadFromFile(const char *filename)
        {
            std::ifstream file(filename, std::ios::in | std::ios::binary);
            insert(begin(),
                std::istream_iterator<uint8_t>(file),
                std::istream_iterator<uint8_t>());
        }

        void saveTofile(const char *filename) const
        {
            std::ofstream file(filename, std::ios::out | std::ios::binary);
            file.write((const char *) data(), size());
            file.close();
        }
};

注意：對於負載優化，請考慮確定文件大小並預先分配所需空間，如此處其他評論中所述。

如何將二進制文件讀入無符號字符向量

問題描述

5 個解決方案

解決方案1
30 已采納 2014-02-15 20:06:39

解決方案2
13 2016-04-16 08:11:03

解決方案3
6 2013-02-28 15:05:33

解決方案4
3 2013-02-28 15:06:55

解決方案5
0 2022-01-05 10:52:40

如何將二進制文件讀入無符號字符向量

問題描述

5 個解決方案

解決方案1 30 已采納 2014-02-15 20:06:39

解決方案2 13 2016-04-16 08:11:03

解決方案3 6 2013-02-28 15:05:33

解決方案4 3 2013-02-28 15:06:55

解決方案5 0 2022-01-05 10:52:40

解決方案1
30 已采納 2014-02-15 20:06:39

解決方案2
13 2016-04-16 08:11:03

解決方案3
6 2013-02-28 15:05:33

解決方案4
3 2013-02-28 15:06:55

解決方案5
0 2022-01-05 10:52:40