简体   繁体   English

用C读取二进制文件

[英]Reading a Binary file in C

Currently attempting to write a program in C to read a .bin file. 当前正在尝试用C编写程序以读取.bin文件。 As you can see by my code, I am clearly missing something, I have attempted to read a lot on, but am still completely stuck. 正如您从我的代码中看到的那样,我显然缺少一些东西,尽管我尝试阅读很多东西,但仍然完全陷入困境。 As expected, my output is not intended. 不出所料,我的输出不是预期的。 My expected output example would be YV2840 KCLT KDAB Thu Jan 16 12:44:00 2014 我的预期输出示例为YV2840 KCLT KDAB 2014年1月16日星期四12:44:00

As I am trying to read a .bin file about airline flights. 当我尝试读取有关航空公司航班的.bin文件时。 Reasons why I think it could be wrong are as follows. 我认为这可能是错误的原因如下。

I am supposed to define a struct called "Human-readable date string". 我应该定义一个名为“人类可读的日期字符串”的结构。 This of course, is not possible, as it will generate a compiler error. 当然,这是不可能的,因为它会生成编译器错误。 Perhaps I am not supposed to take it literally, for now I have it defined as "Time Stamp". 也许我不应该从字面上理解它,因为现在我将其定义为“时间戳”。

The order and size is not matching the format in which the file is written. 顺序和大小与写入文件的格式不匹配。

Here is the bin file, if anyone is interested: http://www.filedropper.com/acars Here is my code: 如果有人感兴趣,这是bin文件: http : //www.filedropper.com/acars这是我的代码:

#include <stdio.h>
#include <stdlib.h>

typedef struct MyStruct_struct {
    int FlightNum[7];
    char OriginAirportCode[5]; 
    char DestAirportCode[5];
    int TimeStamp;
} MyStruct;

int main() {
    FILE * bin;
    MyStruct myStruct;
    bin = fopen("acars.bin", "rb");

    while(1) {
        fread(&myStruct,sizeof(MyStruct),1,bin);
        if(feof(bin)!=0)
            break;
        printf("%d",myStruct.FlightNum);
        printf("%s" ,myStruct.OriginAirportCode);
        printf("%s" ,myStruct.DestAirportCode);
        printf("%d", myStruct.TimeStamp);
    }

    fclose(bin);
    return 0;
}

If you are going to read binary data into your program, then you need to look and see what you are attempting to read. 如果要将二进制数据读入程序,则需要查看并查看尝试读取的内容。 hexdump or od are great tools for looking at data: hexdumpod是查看数据的好工具:

$ hexdump -C -n 512 dat/acars.bin
00000000  59 56 32 38 32 37 00 4b  43 4c 54 00 4b 53 52 51  |YV2827.KCLT.KSRQ|
00000010  00 00 00 00 2c 83 d0 52  59 56 32 37 38 32 00 4b  |....,..RYV2782.K|
00000020  43 4c 54 00 4b 53 52 51  00 00 00 00 cc 3e ed 52  |CLT.KSRQ.....>.R|
00000030  59 56 32 37 33 32 00 4b  43 4c 54 00 4b 53 52 51  |YV2732.KCLT.KSRQ|
00000040  00 00 00 00 88 f4 d5 52  59 56 32 36 37 35 00 4b  |.......RYV2675.K|
00000050  43 4c 54 00 4b 53 52 51  00 00 00 00 20 57 9f 52  |CLT.KSRQ.... W.R|
00000060  59 34 39 38 34 31 00 4b  4d 43 4f 00 4d 4d 4d 58  |Y49841.KMCO.MMMX|

According to your description, you have the flight number, the departure airport, the destination airport and a timestamp. 根据您的描述,您将获得航班号,出发机场,目的地机场和时间戳。 Looking at the data, you find a flight number YV2827 (which is null terminated), you have KCLT which is the IACO identifier for the Charlotte/Douglass Intl. 查看数据,您会发现航班号YV2827 (以零终止),您有KCLT ,这是夏洛特/道格拉斯国际机场的IACO标识符。 Airport, next KSRQ (the IACO identifier for Sarasota, Florida Airport), a couple of bytes of padding followed, finally, by a 4-byte number representing the timestamp. 机场,下一个KSRQ (佛罗里达州萨拉索塔的IACO标识符),填充了几个字节,最后是一个4字节的数字,代表时间戳。 So the data-file makes sense. 因此,数据文件很有意义。

Now how to read it? 现在怎么读呢? If your description holds, then a structure holding the elements should provide a way to read the data. 如果您的描述成立,那么包含元素的结构应提供一种读取数据的方法。 You may have to work with different members and different attributes to get the padding to work out, but something close to the following should work: 您可能需要使用不同的成员和不同的属性才能使填充生效,但是应该满足以下要求:

typedef struct {
    char flight[7];
    char dept[5];
    char dest[5];
    unsigned tstamp;
} flight;

Next, how to read the file, and store the values in memory in your code. 接下来,如何读取文件,并将值存储在代码中的内存中。 If you don't need to store the values, then a simple read and print of the data will be all you need. 如果您不需要存储值,则只需简单读取和打印数据即可。 Assuming you need to store it to make some actual use of the data, then without knowing how many flights are contained in acars.bin , you will need a scheme to read/allocate memory to hold the data. 假设您需要存储它以实际使用数据,然后在不知道acars.bin中包含多少航班的情况下,您将需要一个方案来读取/分配内存以保存数据。

A flexible approach is to use a static buffer to read each flight into, then using malloc / calloc allocate an array of pointers to flight, and realloc as necessary to hold the flight data. 灵活的方法是使用一个静态缓冲读取每个飞行成,然后使用malloc / calloc分配指针数组飞行,并realloc必要保持飞行数据。 Something like: 就像是:

    flight buf = {{0}, {0}, {0}, 0};
    flight **flts = NULL;
    size_t idx = 0;
    size_t nbytes = 0;
    ...
    /* allocate MAXS pointers to flight */
    flts = xcalloc (MAXS, sizeof *flts);

    /* read into buf until no data read, allocate/copy to flts[i] */
    while ((nbytes = fread (&buf, sizeof buf, 1, fp))) {
        flts[idx] = calloc (1, sizeof **flts);
        memcpy (flts[idx++], &buf, sizeof **flts);

        if (idx == maxs)  /* if pointer limit reached, realloc */
            flts = (flight **)xrealloc_dp((void *)flts, &maxs);
    }

Above, the code allocates an initial number of pointers to flight in 'flts' and uses a static struct buf as a buffer to read data from the acars.bin file. 上面的代码在'flts'中分配了指向飞行的初始指针,并使用静态结构buf作为缓冲区从acars.bin文件中读取数据。 On a read where nbytes are read and is non-zero, memory is allocated for storage of the buffer in flts[idx] and memcpy is used to copy the data from buf to flts[idx] . 上,其中一个读出nbytes被读出并且是非零的,存储器被分配给缓冲器的存储flts[idx]memcpy用于将数据从复制bufflts[idx] (you should add validation that what is read is actually what you expect). (您应该添加验证,以确保读取的内容实际上是您期望的内容)。

A standard reallocation scheme is used, having first allocated maxs pointers to struct, when that number is reached, the number of pointers is reallocated to twice the current amount via xrealloc_dp (which is a simple reallocation for a double-pointer macro -- you can use a simple function as well) The intent here is just to keep the body of the code clean so the logic isn't obscured by all the realloc validation code, etc.. 使用一种标准的重新分配方案,首先将maxs指针分配给struct,当达到该数量时,该指针的数量将通过xrealloc_dp重新分配为当前数量的xrealloc_dp (这是双指针宏的简单重新分配,您可以(也使用一个简单的函数)此处的目的只是为了保持代码的主体整洁,以免所有重新realloc验证代码等掩盖逻辑。

Following the complete read of acars.bin, you then have all the values stored in flts (note the timestamp is stored as an unsigned int value, so conversion to a calendar time type and formatting the output is left for your output routine). 完整阅读acars.bin之后,您将所有值存储在flts (请注意,时间戳记存储为unsigned int值,因此转换为日历时间类型并格式化输出留给输出例程)。 A simple reformatting for output could be: 重新格式化输出的格式可能是:

    for (i = 0; i < 10; i++) {
        time_t fdate = (time_t)flts[i]->tstamp;
        printf (" flight[%4zu]  %-8s  %-5s  %-5s  %s", i, flts[i]->flight,
                flts[i]->dept, flts[i]->dest, ctime (&fdate));
    }

where flts[i]->tstamp is cast to time_t and then used with ctime to provide a formatted date for output along with the rest of the flight data. 其中flts[i]->tstampflts[i]->tstamptime_t ,然后与ctime一起使用,以提供格式化的日期以及其他航班数据输出。

Putting all the pieces together, and understanding the xcalloc and xrealloc_dp are just simple error check macros for calloc and realloc , you could use something like the following. 将所有部分放在一起,并了解xcallocxrealloc_dp只是callocrealloc简单错误检查宏,您可以使用类似以下的内容。 There are 2778 flights contained in acars.bin and the code below simply prints the data for the first 10 and last 10 flights: acars.bin包含2778航班,下面的代码仅显示前10个航班和后10个航班的数据:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

/* calloc with error check - exits on any allocation error */
#define xcalloc(nmemb, size)       \
({  void *memptr = calloc((size_t)nmemb, (size_t)size);    \
    if (!memptr) {          \
        fprintf(stderr, "error: virtual memory exhausted.\n");  \
        exit(EXIT_FAILURE); \
    }       \
    memptr; \
})

/* realloc with error check - exits on any allocation error */
#define xrealloc_dp(ptr,nmemb)   \
({ \
    void **p = ptr; \
    size_t *n = nmemb;  \
    void *tmp = realloc (p, 2 * *n * sizeof tmp);       \
    if (!tmp) { \
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);  \
        exit (EXIT_FAILURE);    \
    }   \
    p = tmp;    \
    memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */    \
    *n *= 2;    \
    p;  \
})

#define MAXS 256

typedef struct {
    char flight[7];
    char dept[5];
    char dest[5];
    unsigned tstamp;
} flight;

int main (int argc, char **argv) {

    flight buf = {{0}, {0}, {0}, 0};
    flight **flts = NULL;
    size_t idx = 0;
    size_t nbytes = 0;
    size_t maxs = MAXS;
    size_t i, index;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    /* allocate MAXS pointers to flight */
    flts = xcalloc (MAXS, sizeof *flts);

    /* read into buf until no data read, allocate/copy to flts[i] */
    while ((nbytes = fread (&buf, sizeof buf, 1, fp))) {
        flts[idx] = calloc (1, sizeof **flts);
        memcpy (flts[idx++], &buf, sizeof **flts);

        if (idx == maxs)  /* if pointer limit reached, realloc */
            flts = (flight **)xrealloc_dp((void *)flts, &maxs);
    }
    if (fp != stdin) fclose (fp);

    printf ("\n There are '%zu' flights in acars data.\n", idx);

    printf ("\n The first 10 flights are:\n\n");
    for (i = 0; i < 10; i++) {
        time_t fdate = (time_t)flts[i]->tstamp;
        printf (" flight[%4zu]  %-8s  %-5s  %-5s  %s", i, flts[i]->flight,
                flts[i]->dept, flts[i]->dest, ctime (&fdate));
    }

    printf ("\n The last 10 flights are:\n\n");
    index = idx - 10;
    for (i = index; i < idx; i++) {
        time_t fdate = (time_t)flts[i]->tstamp;
        printf (" flight[%4zu]  %-8s  %-5s  %-5s  %s", i, flts[i]->flight,
                flts[i]->dept, flts[i]->dest, ctime (&fdate));
    }

    /* free memory */
    for (i = 0; i < idx; i++)
        free (flts[i]);
    free (flts);

    return 0;
}

Output 产量

$ ./bin/readacars dat/acars.bin

 There are '2778' flights in acars data.

 The first 10 flights are:

 flight[   0]  YV2827    KCLT   KSRQ   Fri Jan 10 17:33:00 2014
 flight[   1]  YV2782    KCLT   KSRQ   Sat Feb  1 12:37:00 2014
 flight[   2]  YV2732    KCLT   KSRQ   Tue Jan 14 20:38:00 2014
 flight[   3]  YV2675    KCLT   KSRQ   Wed Dec  4 10:24:00 2013
 flight[   4]  Y49841    KMCO   MMMX   Tue Jul 23 13:25:00 2013
 flight[   5]  Y45981    KMCO   MMMX   Wed Feb 26 13:31:00 2014
 flight[   6]  Y45980    MMMX   KMCO   Tue Mar 25 13:49:00 2014
 flight[   7]  Y40981    KMCO   MMMX   Wed Mar  5 13:23:00 2014
 flight[   8]  Y40980    MMMX   KMCO   Sat Mar 29 11:38:00 2014
 flight[   9]  XX0671    KJFK   MSLP   Tue Mar 25 05:46:00 2014

 The last 10 flights are:

 flight[2768]  4O2993    KJFK   MMMX   Wed Feb 12 09:25:00 2014
 flight[2769]  1L9221    KSAT   KSFB   Thu Jan  9 15:41:00 2014
 flight[2770]  1L1761    KCID   KSFB   Tue Jan 14 13:11:00 2014
 flight[2771]  1L1625    KABE   KSFB   Thu Jan 16 10:22:00 2014
 flight[2772]  1L0751    KMFE   KSFB   Thu Jan 16 19:52:00 2014
 flight[2773]  1L0697    KTYS   KSFB   Wed Jan 15 10:21:00 2014
 flight[2774]  1L0696    KSFB   KTYS   Wed Jan 15 07:00:00 2014
 flight[2775]  1L0655    KIAG   KSFB   Fri Jan 17 21:11:00 2014
 flight[2776]  1L0654    KSFB   KIAG   Fri Jan 17 15:49:00 2014
 flight[2777]  1L0641    KGFK   KSFB   Fri Jan 17 14:21:00 2014

Memor Error/Leak Check 记忆错误/泄漏检查

In any code your write that dynamically allocates memory, it is imperative that you use a memory error checking program to insure you haven't written beyond your allocated memory and to confirm that you have freed all the memory you have allocated. 在您的任何可以动态分配内存的代码中,必须使用内存错误检查程序来确保未超出分配的内存进行写操作,并确认已释放所有已分配的内存。 For Linux valgrind is the normal choice. 对于Linux, valgrind是通常的选择。 There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. 滥用内存块的微妙方法有很多,它们可能导致真正的问题,没有任何借口不这样做。 There are similar memory checkers for every platform. 每个平台都有类似的内存检查器。 They are simple to use. 它们易于使用。 Just run your program through it. 只需通过它运行程序即可。

$ valgrind ./bin/readacars dat/acars.bin
==12304== Memcheck, a memory error detector
==12304== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==12304== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==12304== Command: ./bin/readacars dat/acars.bin
==12304==

 There are '2778' flights in acars data.

 The first 10 flights are:

 flight[   0]  YV2827    KCLT   KSRQ   Fri Jan 10 17:33:00 2014
 flight[   1]  YV2782    KCLT   KSRQ   Sat Feb  1 12:37:00 2014
 flight[   2]  YV2732    KCLT   KSRQ   Tue Jan 14 20:38:00 2014
<snip>
 flight[2776]  1L0654    KSFB   KIAG   Fri Jan 17 15:49:00 2014
 flight[2777]  1L0641    KGFK   KSFB   Fri Jan 17 14:21:00 2014
==12304==
==12304== HEAP SUMMARY:
==12304==     in use at exit: 0 bytes in 0 blocks
==12304==   total heap usage: 2,812 allocs, 2,812 frees, 134,011 bytes allocated
==12304==
==12304== All heap blocks were freed -- no leaks are possible
==12304==
==12304== For counts of detected and suppressed errors, rerun with: -v
==12304== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

134,011 bytes allocated and All heap blocks were freed -- no leaks are possible confirms you are freeing all memory you allocate. 分配了134,011字节, 并释放了所有堆块-不可能泄漏,确认您正在释放分配的所有内存。 ERROR SUMMARY: 0 errors from 0 contexts confirms there were no inadvertent writes outside the blocks of memory allocated. 错误摘要:来自0上下文的0错误确认没有在分配的内存块之外进行无意的写操作。

Look over the code, let me know if you have any questions and I'll be happy to help further. 查看代码,让我知道您是否有任何疑问,我们将竭诚为您服务。

Reading binary files is not a simple operation, because they're compiler dependant in the sense that their structure, either for writing or reading, depends on the layout of the struct that generates the data or used to read it. 读取二进制文件不是简单的操作,因为它们依赖于编译器,因为它们的结构(用于写入或读取)取决于生成数据或用于读取数据的struct的布局。

In your binary files records look like structured in this way: 在您的二进制文件中,记录看起来像这样构造:

0x59563238323700 (flight number 7 bytes)
0x4B434C5400 (original airport 5 bytes)
0x4B53525100 (dest airport 5 bytes)
0x000000 (3 bytes padding)
0x2C83D052 (4 bytes timestamp)

As you can see, the first three fields are 7+5+5 = 17 bytes, but int data type for timestamp requires 4 bytes alignment in the program that generated that binary data so data is padded to 20 bytes with 0s. 如您所见,前三个字段是7 + 5 + 5 = 17字节,但是时间戳的int数据类型在生成二进制数据的程序中需要4字节对齐,因此数据以0s填充为20字节。

This means that you must make sure that the layout of your struct is exactly the same of the one that generated that binary data, or read it field by field by taking into account the padding after reversing the original data format. 这意味着您必须确保struct的布局与生成该二进制数据的struct完全相同,或者必须在反转原始数据格式后考虑填充以逐字段读取它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM