I've read a file into an array of characters using fread. Now I want to search that array for two consecutive hex values, namely FF followed by D9 (its a jpeg marker signifying end of file). Here is the code I use to do that:
char* searchBuffer(char* b) {
char* p1 = b;
char* p2 = ++b;
int count = 0;
while (*p1 != (unsigned char)0xFF && *p2 != (unsigned char)0xD9) {
p1++;
p2++;
count++;
}
count = count;
return p1;
}
Now I know this code works if I search for hex values that don't include 0xFF (eg 4E followed by 46), but every time I try searching for 0xFF it fails. When I don't cast the hex values to unsigned char the program doesn't enter the while loop, when I do the program goes through all the chars in the array and doesn't stop until I get an out of bounds error. I'm stumped, please help.
Ignore count, its just a variable that helps me debug.
Thanks in advance.
Why not use memchr()
to find potential matches?
Also, make sure you're dealing with promotions of potentially signed types ( char
may or may not be signed). Note that while 0xff
and 0xd9
have the high bit set when looked at as 8-bit values, they are non-negative integer constants, so there is no 'sign extension' that occurs for them:
char* searchBuffer(char* b) {
unsigned char* p1 = (unsigned char*) b;
int count = 0;
for (;;) {
/* find the next 0xff char */
/* note - this highlights that we really should know the size */
/* of the buffer we're searching, in case we don't find a match */
/* at the moment we're making it up to be some large number */
p1 = memchr(p1, 0xff, UINT_MAX);
if (p1 && (*(p1 + 1) == 0xd9)) {
/* found the 0xff 0xd9 sequence */
break;
}
p1 += 1;
}
return (char *) p1;
}
Also, note that you really should be passing in some notion of the size of the buffer being searched, in case the target isn't found.
Here's a version that takes a buffer size paramter:
char* searchBuffer(char* b, size_t siz) {
unsigned char* p1 = (unsigned char*) b;
unsigned char* end = p1 + siz;
for (;;) {
/* find the next 0xff char */
p1 = memchr(p1, 0xff, end - p1);
if (!p1) {
/* sequnce not found, return NULL */
break;
}
if (((p1 + 1) != end) && (*(p1 + 1) == 0xd9)) {
/* found the 0xff 0xd9 sequence */
break;
}
p1 += 1;
}
return (char *) p1;
}
use void *memmem(const void *haystack, size_t haystacklen, const void *needle, size_t needlelen);
which is available in string.h and easy to use.
char* searchBuffer(char* b, int len)
{
unsigned char needle[2] = {0xFF, 0XD9};
char * c;
c = memmem(b, len, needle, sizeof(needle));
return c;
}
You are falling foul of integer promotions . Both operands for !=
(and similar) are promoted to int
. And if at least one of them is unsigned
, then both of them are treated as unsigned
(actually that isn't 100% accurate, but for this particular situation, it should suffice). So this:
*p1 != (unsigned char)0xFF
is equivalent to:
(unsigned int)*p1 != (unsigned int)(unsigned char)0xFF
On your platform, char
is evidently signed
, in which case it can never take on the value of (unsigned int)0xFF
.
So try casting *p1
as follows:
(unsigned char)*p1 != 0xFF
Alternatively, you could have the function take unsigned char
arguments instead of char
, and avoid all the casting.
[Note that on top of all of this, your loop logic is incorrect, as pointed out in various comments.]
4E will promote itself to a positive integer but *p1
will be negative with FF, and then will be promoted to a very large unsigned value that will be far greater than FF.
You need to make p1
unsigned.
You can write the code a lot shorter as:
char* searchBuffer(const char* b) {
while (*b != '\xff' || *(b+1) != '\xd9') b++;
return b;
}
Also note the function will cause a segmentation fault (or worse, return invalid results) if b does not, in fact, contain the bytes FFD9.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.