简体   繁体   中英

Reading a Binary file in C

Currently attempting to write a program in C to read a .bin file. As you can see by my code, I am clearly missing something, I have attempted to read a lot on, but am still completely stuck. As expected, my output is not intended. My expected output example would be YV2840 KCLT KDAB Thu Jan 16 12:44:00 2014

As I am trying to read a .bin file about airline flights. Reasons why I think it could be wrong are as follows.

I am supposed to define a struct called "Human-readable date string". This of course, is not possible, as it will generate a compiler error. Perhaps I am not supposed to take it literally, for now I have it defined as "Time Stamp".

The order and size is not matching the format in which the file is written.

Here is the bin file, if anyone is interested: http://www.filedropper.com/acars Here is my code:

#include <stdio.h>
#include <stdlib.h>

typedef struct MyStruct_struct {
    int FlightNum[7];
    char OriginAirportCode[5]; 
    char DestAirportCode[5];
    int TimeStamp;
} MyStruct;

int main() {
    FILE * bin;
    MyStruct myStruct;
    bin = fopen("acars.bin", "rb");

    while(1) {
        fread(&myStruct,sizeof(MyStruct),1,bin);
        if(feof(bin)!=0)
            break;
        printf("%d",myStruct.FlightNum);
        printf("%s" ,myStruct.OriginAirportCode);
        printf("%s" ,myStruct.DestAirportCode);
        printf("%d", myStruct.TimeStamp);
    }

    fclose(bin);
    return 0;
}

If you are going to read binary data into your program, then you need to look and see what you are attempting to read. hexdump or od are great tools for looking at data:

$ hexdump -C -n 512 dat/acars.bin
00000000  59 56 32 38 32 37 00 4b  43 4c 54 00 4b 53 52 51  |YV2827.KCLT.KSRQ|
00000010  00 00 00 00 2c 83 d0 52  59 56 32 37 38 32 00 4b  |....,..RYV2782.K|
00000020  43 4c 54 00 4b 53 52 51  00 00 00 00 cc 3e ed 52  |CLT.KSRQ.....>.R|
00000030  59 56 32 37 33 32 00 4b  43 4c 54 00 4b 53 52 51  |YV2732.KCLT.KSRQ|
00000040  00 00 00 00 88 f4 d5 52  59 56 32 36 37 35 00 4b  |.......RYV2675.K|
00000050  43 4c 54 00 4b 53 52 51  00 00 00 00 20 57 9f 52  |CLT.KSRQ.... W.R|
00000060  59 34 39 38 34 31 00 4b  4d 43 4f 00 4d 4d 4d 58  |Y49841.KMCO.MMMX|

According to your description, you have the flight number, the departure airport, the destination airport and a timestamp. Looking at the data, you find a flight number YV2827 (which is null terminated), you have KCLT which is the IACO identifier for the Charlotte/Douglass Intl. Airport, next KSRQ (the IACO identifier for Sarasota, Florida Airport), a couple of bytes of padding followed, finally, by a 4-byte number representing the timestamp. So the data-file makes sense.

Now how to read it? If your description holds, then a structure holding the elements should provide a way to read the data. You may have to work with different members and different attributes to get the padding to work out, but something close to the following should work:

typedef struct {
    char flight[7];
    char dept[5];
    char dest[5];
    unsigned tstamp;
} flight;

Next, how to read the file, and store the values in memory in your code. If you don't need to store the values, then a simple read and print of the data will be all you need. Assuming you need to store it to make some actual use of the data, then without knowing how many flights are contained in acars.bin , you will need a scheme to read/allocate memory to hold the data.

A flexible approach is to use a static buffer to read each flight into, then using malloc / calloc allocate an array of pointers to flight, and realloc as necessary to hold the flight data. Something like:

    flight buf = {{0}, {0}, {0}, 0};
    flight **flts = NULL;
    size_t idx = 0;
    size_t nbytes = 0;
    ...
    /* allocate MAXS pointers to flight */
    flts = xcalloc (MAXS, sizeof *flts);

    /* read into buf until no data read, allocate/copy to flts[i] */
    while ((nbytes = fread (&buf, sizeof buf, 1, fp))) {
        flts[idx] = calloc (1, sizeof **flts);
        memcpy (flts[idx++], &buf, sizeof **flts);

        if (idx == maxs)  /* if pointer limit reached, realloc */
            flts = (flight **)xrealloc_dp((void *)flts, &maxs);
    }

Above, the code allocates an initial number of pointers to flight in 'flts' and uses a static struct buf as a buffer to read data from the acars.bin file. On a read where nbytes are read and is non-zero, memory is allocated for storage of the buffer in flts[idx] and memcpy is used to copy the data from buf to flts[idx] . (you should add validation that what is read is actually what you expect).

A standard reallocation scheme is used, having first allocated maxs pointers to struct, when that number is reached, the number of pointers is reallocated to twice the current amount via xrealloc_dp (which is a simple reallocation for a double-pointer macro -- you can use a simple function as well) The intent here is just to keep the body of the code clean so the logic isn't obscured by all the realloc validation code, etc..

Following the complete read of acars.bin, you then have all the values stored in flts (note the timestamp is stored as an unsigned int value, so conversion to a calendar time type and formatting the output is left for your output routine). A simple reformatting for output could be:

    for (i = 0; i < 10; i++) {
        time_t fdate = (time_t)flts[i]->tstamp;
        printf (" flight[%4zu]  %-8s  %-5s  %-5s  %s", i, flts[i]->flight,
                flts[i]->dept, flts[i]->dest, ctime (&fdate));
    }

where flts[i]->tstamp is cast to time_t and then used with ctime to provide a formatted date for output along with the rest of the flight data.

Putting all the pieces together, and understanding the xcalloc and xrealloc_dp are just simple error check macros for calloc and realloc , you could use something like the following. There are 2778 flights contained in acars.bin and the code below simply prints the data for the first 10 and last 10 flights:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

/* calloc with error check - exits on any allocation error */
#define xcalloc(nmemb, size)       \
({  void *memptr = calloc((size_t)nmemb, (size_t)size);    \
    if (!memptr) {          \
        fprintf(stderr, "error: virtual memory exhausted.\n");  \
        exit(EXIT_FAILURE); \
    }       \
    memptr; \
})

/* realloc with error check - exits on any allocation error */
#define xrealloc_dp(ptr,nmemb)   \
({ \
    void **p = ptr; \
    size_t *n = nmemb;  \
    void *tmp = realloc (p, 2 * *n * sizeof tmp);       \
    if (!tmp) { \
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);  \
        exit (EXIT_FAILURE);    \
    }   \
    p = tmp;    \
    memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */    \
    *n *= 2;    \
    p;  \
})

#define MAXS 256

typedef struct {
    char flight[7];
    char dept[5];
    char dest[5];
    unsigned tstamp;
} flight;

int main (int argc, char **argv) {

    flight buf = {{0}, {0}, {0}, 0};
    flight **flts = NULL;
    size_t idx = 0;
    size_t nbytes = 0;
    size_t maxs = MAXS;
    size_t i, index;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    /* allocate MAXS pointers to flight */
    flts = xcalloc (MAXS, sizeof *flts);

    /* read into buf until no data read, allocate/copy to flts[i] */
    while ((nbytes = fread (&buf, sizeof buf, 1, fp))) {
        flts[idx] = calloc (1, sizeof **flts);
        memcpy (flts[idx++], &buf, sizeof **flts);

        if (idx == maxs)  /* if pointer limit reached, realloc */
            flts = (flight **)xrealloc_dp((void *)flts, &maxs);
    }
    if (fp != stdin) fclose (fp);

    printf ("\n There are '%zu' flights in acars data.\n", idx);

    printf ("\n The first 10 flights are:\n\n");
    for (i = 0; i < 10; i++) {
        time_t fdate = (time_t)flts[i]->tstamp;
        printf (" flight[%4zu]  %-8s  %-5s  %-5s  %s", i, flts[i]->flight,
                flts[i]->dept, flts[i]->dest, ctime (&fdate));
    }

    printf ("\n The last 10 flights are:\n\n");
    index = idx - 10;
    for (i = index; i < idx; i++) {
        time_t fdate = (time_t)flts[i]->tstamp;
        printf (" flight[%4zu]  %-8s  %-5s  %-5s  %s", i, flts[i]->flight,
                flts[i]->dept, flts[i]->dest, ctime (&fdate));
    }

    /* free memory */
    for (i = 0; i < idx; i++)
        free (flts[i]);
    free (flts);

    return 0;
}

Output

$ ./bin/readacars dat/acars.bin

 There are '2778' flights in acars data.

 The first 10 flights are:

 flight[   0]  YV2827    KCLT   KSRQ   Fri Jan 10 17:33:00 2014
 flight[   1]  YV2782    KCLT   KSRQ   Sat Feb  1 12:37:00 2014
 flight[   2]  YV2732    KCLT   KSRQ   Tue Jan 14 20:38:00 2014
 flight[   3]  YV2675    KCLT   KSRQ   Wed Dec  4 10:24:00 2013
 flight[   4]  Y49841    KMCO   MMMX   Tue Jul 23 13:25:00 2013
 flight[   5]  Y45981    KMCO   MMMX   Wed Feb 26 13:31:00 2014
 flight[   6]  Y45980    MMMX   KMCO   Tue Mar 25 13:49:00 2014
 flight[   7]  Y40981    KMCO   MMMX   Wed Mar  5 13:23:00 2014
 flight[   8]  Y40980    MMMX   KMCO   Sat Mar 29 11:38:00 2014
 flight[   9]  XX0671    KJFK   MSLP   Tue Mar 25 05:46:00 2014

 The last 10 flights are:

 flight[2768]  4O2993    KJFK   MMMX   Wed Feb 12 09:25:00 2014
 flight[2769]  1L9221    KSAT   KSFB   Thu Jan  9 15:41:00 2014
 flight[2770]  1L1761    KCID   KSFB   Tue Jan 14 13:11:00 2014
 flight[2771]  1L1625    KABE   KSFB   Thu Jan 16 10:22:00 2014
 flight[2772]  1L0751    KMFE   KSFB   Thu Jan 16 19:52:00 2014
 flight[2773]  1L0697    KTYS   KSFB   Wed Jan 15 10:21:00 2014
 flight[2774]  1L0696    KSFB   KTYS   Wed Jan 15 07:00:00 2014
 flight[2775]  1L0655    KIAG   KSFB   Fri Jan 17 21:11:00 2014
 flight[2776]  1L0654    KSFB   KIAG   Fri Jan 17 15:49:00 2014
 flight[2777]  1L0641    KGFK   KSFB   Fri Jan 17 14:21:00 2014

Memor Error/Leak Check

In any code your write that dynamically allocates memory, it is imperative that you use a memory error checking program to insure you haven't written beyond your allocated memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are simple to use. Just run your program through it.

$ valgrind ./bin/readacars dat/acars.bin
==12304== Memcheck, a memory error detector
==12304== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==12304== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==12304== Command: ./bin/readacars dat/acars.bin
==12304==

 There are '2778' flights in acars data.

 The first 10 flights are:

 flight[   0]  YV2827    KCLT   KSRQ   Fri Jan 10 17:33:00 2014
 flight[   1]  YV2782    KCLT   KSRQ   Sat Feb  1 12:37:00 2014
 flight[   2]  YV2732    KCLT   KSRQ   Tue Jan 14 20:38:00 2014
<snip>
 flight[2776]  1L0654    KSFB   KIAG   Fri Jan 17 15:49:00 2014
 flight[2777]  1L0641    KGFK   KSFB   Fri Jan 17 14:21:00 2014
==12304==
==12304== HEAP SUMMARY:
==12304==     in use at exit: 0 bytes in 0 blocks
==12304==   total heap usage: 2,812 allocs, 2,812 frees, 134,011 bytes allocated
==12304==
==12304== All heap blocks were freed -- no leaks are possible
==12304==
==12304== For counts of detected and suppressed errors, rerun with: -v
==12304== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

134,011 bytes allocated and All heap blocks were freed -- no leaks are possible confirms you are freeing all memory you allocate. ERROR SUMMARY: 0 errors from 0 contexts confirms there were no inadvertent writes outside the blocks of memory allocated.

Look over the code, let me know if you have any questions and I'll be happy to help further.

Reading binary files is not a simple operation, because they're compiler dependant in the sense that their structure, either for writing or reading, depends on the layout of the struct that generates the data or used to read it.

In your binary files records look like structured in this way:

0x59563238323700 (flight number 7 bytes)
0x4B434C5400 (original airport 5 bytes)
0x4B53525100 (dest airport 5 bytes)
0x000000 (3 bytes padding)
0x2C83D052 (4 bytes timestamp)

As you can see, the first three fields are 7+5+5 = 17 bytes, but int data type for timestamp requires 4 bytes alignment in the program that generated that binary data so data is padded to 20 bytes with 0s.

This means that you must make sure that the layout of your struct is exactly the same of the one that generated that binary data, or read it field by field by taking into account the padding after reversing the original data format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM