简体   繁体   中英

Searching for 2 consecutive hex values in a char array of a file

I've read a file into an array of characters using fread. Now I want to search that array for two consecutive hex values, namely FF followed by D9 (its a jpeg marker signifying end of file). Here is the code I use to do that:

char* searchBuffer(char* b) {
    char* p1 = b;
    char* p2 = ++b;
    int count = 0;

    while (*p1 != (unsigned char)0xFF && *p2 != (unsigned char)0xD9) {
        p1++;
        p2++;
        count++;
    }

    count = count;
    return p1;
}

Now I know this code works if I search for hex values that don't include 0xFF (eg 4E followed by 46), but every time I try searching for 0xFF it fails. When I don't cast the hex values to unsigned char the program doesn't enter the while loop, when I do the program goes through all the chars in the array and doesn't stop until I get an out of bounds error. I'm stumped, please help.

Ignore count, its just a variable that helps me debug.

Thanks in advance.

Why not use memchr() to find potential matches?

Also, make sure you're dealing with promotions of potentially signed types ( char may or may not be signed). Note that while 0xff and 0xd9 have the high bit set when looked at as 8-bit values, they are non-negative integer constants, so there is no 'sign extension' that occurs for them:

char* searchBuffer(char* b) {
    unsigned char* p1 = (unsigned char*) b;
    int count = 0;

    for (;;) {
        /* find the next 0xff char */
        /* note - this highlights that we really should know the size   */
        /* of the buffer we're searching, in case we don't find a match */
        /* at the moment we're making it up to be some large number     */
        p1 = memchr(p1, 0xff, UINT_MAX);
        if (p1 && (*(p1 + 1) == 0xd9)) {
            /* found the 0xff 0xd9 sequence */
            break;
        }

        p1 += 1;
    }

    return (char *) p1;
}

Also, note that you really should be passing in some notion of the size of the buffer being searched, in case the target isn't found.

Here's a version that takes a buffer size paramter:

char* searchBuffer(char* b, size_t siz) {
    unsigned char* p1 = (unsigned char*) b;
    unsigned char* end = p1 + siz;

    for (;;) {
        /* find the next 0xff char */
        p1 = memchr(p1, 0xff, end - p1);
        if (!p1) {
            /* sequnce not found, return NULL */
            break;
        }


        if (((p1 + 1) != end) && (*(p1 + 1) == 0xd9)) {
            /* found the 0xff 0xd9 sequence */
            break;
        }

        p1 += 1;
    }

    return (char *) p1;
}

use void *memmem(const void *haystack, size_t haystacklen, const void *needle, size_t needlelen);

which is available in string.h and easy to use.

char* searchBuffer(char* b, int len) 
{
    unsigned char needle[2] = {0xFF, 0XD9};
    char * c;
    c = memmem(b, len, needle, sizeof(needle));
    return c;
}

You are falling foul of integer promotions . Both operands for != (and similar) are promoted to int . And if at least one of them is unsigned , then both of them are treated as unsigned (actually that isn't 100% accurate, but for this particular situation, it should suffice). So this:

*p1 != (unsigned char)0xFF

is equivalent to:

(unsigned int)*p1 != (unsigned int)(unsigned char)0xFF

On your platform, char is evidently signed , in which case it can never take on the value of (unsigned int)0xFF .

So try casting *p1 as follows:

(unsigned char)*p1 != 0xFF

Alternatively, you could have the function take unsigned char arguments instead of char , and avoid all the casting.

[Note that on top of all of this, your loop logic is incorrect, as pointed out in various comments.]

4E will promote itself to a positive integer but *p1 will be negative with FF, and then will be promoted to a very large unsigned value that will be far greater than FF.

You need to make p1 unsigned.

You can write the code a lot shorter as:

char* searchBuffer(const char* b) {
    while (*b != '\xff' || *(b+1) != '\xd9') b++;
    return b;
}

Also note the function will cause a segmentation fault (or worse, return invalid results) if b does not, in fact, contain the bytes FFD9.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM