简体   繁体   English

在 C 中搜索二进制模式(读取缓冲的二进制文件)

[英]Search for Binary Pattern in C (Read buffered binary file)

Hey there.嘿。 I'm trying to write a small program that will read the four following bytes after the last occurrence of "0xFF 0xC0 0x00 0x11" which can be converted easily to binary or decimal.我正在尝试编写一个小程序,该程序将在最后一次出现“0xFF 0xC0 0x00 0x11”后读取以下四个字节,该程序可以轻松转换为二进制或十进制。 The purpose is that the 2-5 bytes following the last occurrence of that hex pattern represent the width and height of a JPEG file.目的是最后一次出现该十六进制模式后的 2-5 个字节表示 JPEG 文件的宽度和高度。

#include <stdio.h>

 int main () {
  FILE * pFile;
  long lSize;
  char * buffer;
  size_t result;

  pFile = fopen ( "pano8sample.jpg" , "rb" );
  if(pFile==NULL){
   fputs ("File error",stderr);
   exit (1);
  }

  fseek (pFile , 0 , SEEK_END);
  lSize = ftell (pFile);
  rewind (pFile);

  printf("\n\nFile is %d bytes big\n\n", lSize);

  buffer = (char*) malloc (sizeof(char)*lSize);
  if(buffer == NULL){
   fputs("Memory error",stderr);
   exit (2);
  }

  result = fread (buffer,1,lSize,pFile);
  if(result != lSize){
   fputs("Reading error",stderr);
   exit (3);
  }

  //0xFF 0xC0 0x00 0x11 (0x08)

  //Logic to check for hex/binary/dec

  fclose (pFile);
  free (buffer);
  return 0;
 }

The problem is I don't know how to read from the buffered memory recursively and use the most recently read variable as an int to compare against my binary/hex/dec.问题是我不知道如何递归地从缓冲内存中读取并使用最近读取的变量作为 int 与我的二进制/十六进制/十进制进行比较。

How do I do this?我该怎么做呢?

byte needle[4] = {0xff, 0xc0, 0x00, 0x11};
byte *last_needle = NULL;
while (true) {
  byte *p = memmem(buffer, lSize, needle, 4); 
  if (!p) break;
  last_needle = p;
  lSize -= (p + 4) - buffer;
  buffer = p + 4;
}

If last_needle is not null, you can print out last_needle+4 ...如果last_needle不为空,则可以打印出last_needle+4 ...

instead of reading the entire file into memory, I would use a bit of a state machine.我会使用一些状态机,而不是将整个文件读入内存。 My C is a bit rusty, but:我的 C 有点生疏,但是:

char searchChars[] = {0xFF,0xC0,0x00,0x11};
char lastBytes[5];
int pos = 0; int curSearch = 0;
while(pos <= lSize) {
    curChar = getc(pfile); pos++;            /*readone char*/

    if(curChar == searchChars[curSearch]) { /* found a match */
        curSearch++;                        /* search for next char */
        if(curSearch > 3) {                 /* found the whole string! */
            curSearch = 0;                  /* start searching again */
            read = fread(lastBytes,1,5,pfile); /* read 5 bytes */
            pos += read;                      /* advance position by how much we read */
        }
    } else { /* didn't find a match */
        curSearch = 0;                     /* go back to searching for first char */
    }
 }

at the end, you're left with 5 bytes in lastBytes which are the five bytes right after the last time you find searchChars最后,您在 lastBytes 中剩下 5 个字节,这是您上次找到 searchChars 之后的五个字节

Personally, I'd use a function that swallows one character at a time.就个人而言,我会使用一次吞下一个字符的函数。 The function will use a finite state machine to do a simple regular expression match, saving details in a either static local variables or a parameter block structure.该函数将使用有限状态机进行简单的正则表达式匹配,将细节保存在静态局部变量或参数块结构中。 You need two sub-blocks - one for part-matched state, and one for the last complete match - each indicating the relevant positions or value as needed.您需要两个子块 - 一个用于部分匹配状态,一个用于最后一次完整匹配 - 每个都根据需要指示相关位置或值。

In this case, you should be able to design this manually.在这种情况下,您应该能够手动设计它。 For more complex requirements, look at Ragel .对于更复杂的需求,请查看Ragel

You can use the fscanf function in C/C++ if the data is encoded in ascii.如果数据以 ascii 编码,则可以在 C/C++ 中使用 fscanf 函数。 If its not, you will have to write your own function that will do this.如果不是,您将必须编写自己的函数来执行此操作。 Simple way would be to read N amount of bytes from the file, search the byte string for the pattern you want then continue until EOF.简单的方法是从文件中读取 N 个字节,在字节字符串中搜索您想要的模式,然后继续直到 EOF。

Your code actually reads the entire file all at once (unnecessary if the line you are looking for is near the top of the file.) Your code stores the file on the heap as a byte array (char is equivalent to a byte in C++) with buffer the pointer to the start of the contiguous array in memory.您的代码实际上一次读取整个文件(如果您要查找的行靠近文件顶部,则不需要。)您的代码将文件作为字节数组存储在堆上(char 相当于 C++ 中的一个字节) with buffer 指向内存中连续数组开头的指针。 Manipulate the buffer array just like you would manipulate any other array.像操作任何其他数组一样操作缓冲区数组。

Also, if you intend to do anything after you have read the size, make sure you free the malloced buffer object to avoid a leak.此外,如果您打算在读取大小后执行任何操作,请确保释放 malloced 缓冲区对象以避免泄漏。

使用magic_open() 和magic_print() 更安全易用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM