從包含IP標頭片段的二進制文件中讀取結構的最佳方法是什么？

Question

在計算機網絡實驗室期間，我不得不讀取許多二進制文件，其中包含IPv4格式的數據包。 這是IPv4標頭文件格式。

以下結構封裝了IP標頭的所有基本部分。

struct ip_header {
    uint8_t version;
    uint8_t header_length;
    uint8_t service_type;
    uint16_t total_length;
    uint16_t identification;
    uint8_t flags;
    uint16_t fragment_offset;
    uint8_t ttl;
    uint8_t protocol;
    uint16_t checksum;
    uint32_t src;
    uint32_t dest;
    /* other fields for options if needed */
};

讀取二進制文件以結構化格式獲取數據的一種方法是逐字節讀取文件，然后將每個字節字段專門轉換為上述結構的各個字段。 讀取文件不是問題。

我想知道這是否是唯一的方法，或者還有其他不錯的和神奇的方法來實現這一目標。 另外，最近我知道字節序在讀取具有不同大小數據類型的這類文件時也會產生一些問題。

Answer 1

通常的方法是使用類似fread東西

bool readIpHeader(ip& buffer, const std::string& filename)
{
    auto pFile= fopen(filename.data(), "rb");
    if (!pFile) {
        return false;
    }
    auto ok= fread(&buffer, sizeof(buffer), 1, pFile) == 1;
    fclose(pFile);
    return ok;
}

這會將sizeof(buffer)讀入地址&buffer ：它將用文件內容填充緩沖區； fread成功將返回1。

就像Ted指出的那樣，您的結構很糟糕。 您可以將https://unix.superglobalmegacorp.com/Net2/newsrc/netinet/ip.h.html用作源（很可能，如果您使用的是Linux，則可以include該文件）：

struct ip {
#if BYTE_ORDER == LITTLE_ENDIAN 
    u_char    ip_hl:4,           /* header length */
              ip_v:4;            /* version */
#endif
#if BYTE_ORDER == BIG_ENDIAN 
    u_char    ip_v:4,            /* version */
        ip_hl:4;        /* header length */
#endif
    u_char    ip_tos;            /* type of service */
    short     ip_len;            /* total length */
    u_short   ip_id;             /* identification */
    short     ip_off;            /* fragment offset field */
#define    IP_DF 0x4000          /* dont fragment flag */
#define    IP_MF 0x2000          /* more fragments flag */
    u_char    ip_ttl;            /* time to live */
    u_char    ip_p;              /* protocol */
    u_short   ip_sum;            /* checksum */
    struct    in_addr ip_src,ip_dst;    /* source and dest address */
};

Answer 2

如果你關心可移植性，尤其是強制執行的16位和32位的自然對齊大端結構變量，你不能只寫你的內存布局struct到磁盤。 下一版本的編譯器可能會打包數據，並破壞與所有數據文件的兼容性。 一家以上的大公司發現，他們通過在另一個CPU上進行編譯而不進行規范化，意外地創建了兩種數據格式，大字節序和小字節序。 通常，沒有任何簡單的方法可以判斷一個舊文件保存在哪個文件中。請記住，數據的壽命超過了代碼！

假設您要在程序中使用ip_header結構，應對其進行填充以進行有效訪問，並且其目的不僅僅是ip_header文件布局。

當不同大小的字段散布在一起時，就沒有辦法單獨設置它們了。 您不能假設實現可以使用任意的，未對齊的地址作為指針。 在這種情況下，我也不假定該文件的字節序與您的CPU相同。 我將字節順序定義為big-endian。 （如果您希望此代碼在x86等低端字節順序的CPU上運行，則可以將順序定義為低端字節順序，但是仍然可以使用glib或OS的低端字節順序轉換函數來防御性地進行編碼。）

您可以像這樣從磁盤上的布局可移植地轉換為內存中的結構：

#include <arpa/inet.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct ip_header {
    uint8_t version;
    uint8_t header_length;
    uint8_t service_type;
    uint16_t total_length;
    uint16_t identification;
    uint8_t flags;
    uint16_t fragment_offset;
    uint8_t ttl;
    uint8_t protocol;
    uint16_t checksum;
    uint32_t src;
    uint32_t dest;
    /* other fields for options if needed */
} ip_header;

#define IP_HEADER_DISK_LEN 22U

bool read_ip_header( FILE* const input, ip_header* const d )
{
  char buffer[IP_HEADER_DISK_LEN];

  if ( IP_HEADER_DISK_LEN !=
       fread( buffer, 1, IP_HEADER_DISK_LEN, input ) ) {
    return false;
  }

  memset( d, 0, sizeof(*d) );

  memcpy( &d->version,         &buffer[0],  sizeof(d->version) );
  memcpy( &d->header_length,   &buffer[1],  sizeof(d->header_length) );
  memcpy( &d->service_type,    &buffer[2],  sizeof(d->service_type) );
  memcpy( &d->total_length,    &buffer[3],  sizeof(d->total_length) );
  d->total_length = ntohs(d->total_length);
  memcpy( &d->identification,  &buffer[5],  sizeof(d->identification) );
  d->identification = ntohs(d->identification);
  memcpy( &d->flags,           &buffer[7],  sizeof(d->flags) );
  memcpy( &d->fragment_offset, &buffer[8],  sizeof(d->fragment_offset) );
  d->fragment_offset = ntohs(d->fragment_offset);
  memcpy( &d->ttl,             &buffer[10], sizeof(d->ttl) );
  memcpy( &d->protocol,        &buffer[11], sizeof(d->protocol) );
  memcpy( &d->checksum,        &buffer[12], sizeof(d->checksum) );
  d->checksum = ntohs(d->checksum);
  memcpy( &d->src,             &buffer[14], sizeof(d->src) );
  d->src = ntohl(d->src);
  memcpy( &d->dest,            &buffer[18], sizeof(d->dest) );
  d->dest = ntohl(d->dest);

  return true;
}

這會讀取整個標頭，但是您可能會進行單獨的I / O調用，甚至將文件映射到內存中。 大多數現代的編譯器都很聰明，可以將對連續位置的連續memcpy()調用組合，將不需要的字節交換編譯為無操作，並且僅將不會立即被覆蓋的memset()字節編譯，因此，如果您能只需復制字節，這種方式應該同樣有效。 （出於您的目的，您甚至可以跳過將填充字節清零並進行字節序轉換。）

請記住，讀取操作比處理位對齊字節序或填充的位糾錯要長得多。 嘗試優化這些時間並不能很好地利用您的時間。 特別是如果它在另一個編譯器上編譯為不兼容的程序！

Answer 3

如果您的IPv4標頭以與“它們進來”相同的格式存儲（這是通常的存儲方式）-源地址和目標地址是標頭中的最后一個字段，則應這樣做：

#include <fstream>
#include <iostream>

#include <netinet/ip.h> // a common place to find a "iphdr" definition

// add a streaming operator for reading an iphdr
std::istream& operator>>(std::istream& is, iphdr& ip) {
    return is.read(reinterpret_cast<char*>(&ip), sizeof(iphdr));
}

// add a streaming operator for writing an iphdr
std::ostream& operator<<(std::ostream& os, const iphdr& ip) {
    return os.write(reinterpret_cast<const char*>(&ip), sizeof(iphdr));
}

int main() {
    std::ifstream ips("ipheaders");
    if(ips) {
        iphdr h;
        while(ips >> h) {
            std::cout << h.version << "\n"
                      << h.ihl << "\n"
                      << h.tos << "\n"
                      << h.tot_len << "\n"
                      << h.id << "\n"
                      << h.frag_off << "\n"
                      << h.ttl << "\n"
                      << h.protocol << "\n"
                      << h.check << "\n"
                      << h.saddr << "\n"
                      << h.daddr << "\n";
        }
    }
}

物理標頭中的前4位始終是version但正如@Mirco所顯示的那樣，當您擺弄位字段時，編譯程序的計算機的字節順序會有所變化。 通過網絡到達並存儲在文件中的前4位仍然是version -如果您也使用添加的operator<<將iphdr寫入磁盤，也將成為version 。 如果您想攜帶便攜式設備，請完全按照自發明IPv4以來的樣子來讀寫IP標頭。

幸運的是，ip頭的布局與大多數系統上所需基本數據類型的對齊方式匹配。 如果發現無法創建與原始數據匹配的IP標頭結構的系統，則很可能找不到netinet/ip.h但是如果您仍然擔心，可以添加編譯時檢查：

    static_assert(alignof(uint8_t) == 1);
    static_assert(alignof(uint16_t) == 2);
    static_assert(alignof(uint32_t) == 4);

Answer 4

我認為這：

    #include <stdint.h>
    #include <arpa/inet.h>
    #include <netinet/ip.h>
    ....
    #define IPSIZ  20
    static void ntoip(uint8_t *buf, struct ip *i) {
        i->ip_vhl = buf[0];
        i->ip_tos = buf[1];
        i->ip_len = ntohs(buf+2);
        i->ip_id  = ntohs(buf+4);
        i->ip_off = ntohs(buf+6);
        i->ip_ttl = buf[8];
        i->ip_p   = buf[9];
        i->ip_sum = ntohs(buf+10);
        i->ip_src = ntohl(buf+12);
        i->ip_dst = ntohl(buf+16);           
    }
    int fget_ip(FILE *fp, struct ip *i) {
        uint8_t buf[IPSIZ];
        if (fread(buf, sizeof buf, 1, fp) == 1) {
            ntoip(buf, i);
            return 1;
        }
        return 0;
    }
...
    void iptoh(struct ip *i, uint8_t *buf) {
...
    }
    int fput_ip(struct ip *i, FILE *fp) {
....
    }

是你最好的選擇。 簡單，清晰，易於理解，便攜。 您可以確保始終按網絡順序讀取和存儲該文件，這樣無論從文件還是實際設備中都可以正常工作。

如果以某種方式成為性能問題，則將其封裝起來，您可以在一個地方將其替換為常規的惡作劇堆。

從包含IP標頭片段的二進制文件中讀取結構的最佳方法是什么？

問題描述

4 個解決方案

解決方案1
2 已采納 2019-09-18 19:01:40

解決方案2
2 2019-09-19 01:16:17

解決方案3
2 2019-09-19 17:25:33

解決方案4
0 2019-09-18 19:40:49

從包含IP標頭片段的二進制文件中讀取結構的最佳方法是什么？

問題描述

4 個解決方案

解決方案1 2 已采納 2019-09-18 19:01:40

解決方案2 2 2019-09-19 01:16:17

解決方案3 2 2019-09-19 17:25:33

解決方案4 0 2019-09-18 19:40:49

解決方案1
2 已采納 2019-09-18 19:01:40

解決方案2
2 2019-09-19 01:16:17

解決方案3
2 2019-09-19 17:25:33

解決方案4
0 2019-09-18 19:40:49