简体   繁体   中英

UNIX C parse file

I have a binary file format with header and body. The header fields and number of bytes for every of it is

  1. Magic: 1
  2. Header_size: 2
  3. Version: 1
  4. No_of_sections: 1
  5. Section_headers: No_of_sections * sizeof(section_header) Section Header: 17+4+4+4 Sect_name: 17; Sect_type: 4; Sect_offset: 4; Sect_size: 4

I need to verify some restrictions and print this

version=<version_number>
nr_sections=<no_of_sections>
section1: <NAME_1> <TYPE_1> <SIZE_1>

Restrictions:

  1. The value of the magic field must be 'Q'
  2. The values of the file version must be between 41 and 67, including that values.
  3. The number of sections must be between 3 and 10, including that values.
  4. The existing sections' type must be only 44 68 35

I done the magic and version part, but i cant do the section part, it always prints some characters.

I know my code is really messy. Sorry for that :(

void parse(const char *path)
{
    int fd;
    unsigned char c[17];
    char name[17];
    int type;
    off_t size;

    fd = open(path, O_RDONLY);
    size=  lseek(fd,0,SEEK_END);
    lseek(fd,0,SEEK_SET);
    for(int i = 0 ; i <=size; i++)
    {
        lseek(fd,0,SEEK_CUR);
        read(fd,&c,1);
        if(i==0)
            if(c[0]=='Q')
                printf("SUCCESS\n");
        if(i<5 && c[0]>=')' && c[0]<='C')
            printf("version=%d \n", c[0]);
    }

    lseek(fd,5,SEEK_SET);
    for(int j=1; j<10; j++)
    {
        read(fd, &name, 17);
        name[17]='\0';
        read(fd, &type, 4);
        printf("section%d: %s %d\n", j + 1, name, type);
    }
}

The second for shoud be

for(int j =1;j<=no_of_section;j++)

but i dont know the nr_of_section :( My output

SUCCESS
version=46 
section2: fJ00pYisvmveDqS 44
section3: V 1515418709
section4: fRo 2054764888
section5: e6NpWyIifXZ -1392508919
section6:  738197570
section7:  1247047749
section8: J5ArY 1985282048
section9: 707OpGRoR8l9Yen# 381353984
section10:  2053662817

The output should be:

        SUCCESS
        version=46
        nr_sections=7 
        section1: fJ00pYisvmveDqS 44 1016 
        section2: LLSWA0rSmUtSZfRo 44 890
        section3: lX9yze6NpWyIifXZ 44 941 
        section4: de0cLp2V907jC9B 44 1178
        section5: JrUrWEEpTJJ5ArY 68 724
        section6: Uv707OpGRoR8l9Yen 35 1014 
        section7: BOWdKpZwrBaahhzz 44 972

Binary File exemple ( it includes only the header and 2 sections )

Review of your code

There are many things wrong with your code. Let's go through it:

void parse(const char *path)
{
    int fd;
    unsigned char c[17];
    char name[17];
    int type;
    off_t size;

    fd = open(path, O_RDONLY);

You did not check if the call to open() succeeded. If there is an error, fd will be -1 . Make sure you check and properly handle this case.

    size=  lseek(fd,0,SEEK_END);

Similarly, lseek() can return an error. Some files might not be seekable. You can probably avoid having to determine the size of the file, see below.

    lseek(fd,0,SEEK_SET);
    for(int i = 0 ; i <=size; i++)
    {

If size has type off_t , it is better to make i an off_t as well.

        lseek(fd,0,SEEK_CUR);

This call does nothing! Why is it even here?

        read(fd,&c,1);

Again, you do not check the return value of read() . There might be a read error, or the file might be smaller than you thought. Check that the return value is not -1 , and is the expected length ( 1 ).

        if(i==0)
            if(c[0]=='Q')
                printf("SUCCESS\n");

This looks like an attempt to implement the loop-switch pattern . Please do not do this. If you want to read the first byte and treat it specially, don't put it in a for-loop.

        if(i<5 && c[0]>=')' && c[0]<='C')
            printf("version=%d \n", c[0]);

Here you are saying that each of the second to fifth bytes of the file must be within ')' and 'C' . But from your description, that's not what you want. Instead, you should read two bytes (header length), one byte (version) and another byte (number of sections). Do this without a for-loop.

    }

After the fifth byte, your for loop is just reading bytes for nothing.

    lseek(fd,5,SEEK_CUR);

After reading size bytes, you now try to skip another five bytes from the current position, which means you want to go 5 bytes beyond the end of the file.

    for(int j=1; j<10; j++)
    {

If you read the number of sections properly, you can use that instead of hardcoding the 10 . Also, are you sure you want to start with j = 1 ?

        read(fd, &name, 17);

Again, check the return value.

        name[17]='\0';

Oops, that's a buffer overflow! You declared name to have only 17 bytes, and now you are writing to the eighteenth.

        read(fd, &type, 4);

Are you sure that type is large enough to hold 4 bytes? In C, an int has a platform-dependent size. Sure, 4 bytes for an int is common, but it is better to use int32_t if you really want a four byte int.

        printf("section%d: %s %d\n", j + 1, name, type);

Since you start with j=1 , the first line you print will start with section2: .

    }
}

Suggested path forward

The first thing you should do is try to properly parse the header. Make sure you have read the number of sections from the header, so you don't need to know the file size and/or hardcode any numbers. Don't use lseek() . Don't worry about the rest of the file until your code handles the header.

Once you have all the elements of the header parsed correctly you can start reading the sections that come after the header. Again, each section has a header, so first work on parsing the section header correctly. Each section header has a value indicating the length of the section and an offset. Store these in an array.

Now that you have parsed the section headers can you go to the actual data. Probably now you should start using lseek() to go to the offsets in the file that was mentioned in the section headers, and then read() as many bytes as the section size is.

As @G. Sliepen has already mentioned your code has many bugs.

From my comment you now have replaced lseek(fd,5,SEEK_CUR); by lseek(fd,5,SEEK_SET); .

Thus the result of the first output is fine. But for the 2., 3. and so on the ouput is wrong.

In your description of the binay format you have written

  1. Sect_name: 17
  2. Sect_type: 4
  3. Sect_offset: 4
  4. Sect_size: 4

But in the code of your loop you are only reading Sect_name and Sec_type . Now you should either skip 8 bytes (for Sect_offset and Sect_size ) or read it in. Otherwise you will get the result as you have seen already.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM