简体   繁体   English

在存档中搜索文件并将其加载到内存中

[英]Search for file in archive and load it into memory

Basically I need to load a file within an archive into memory, but since the user is able to modify the contents of the archive it is very likely that the file offset will change.基本上我需要将存档中的文件加载到内存中,但由于用户能够修改存档的内容,因此文件偏移量很可能会发生变化。

So I need to create a function that searches the archive for a file with the help of a hex pattern, returns the file offset, loads the file into memory and returns the file address.所以我需要创建一个函数,在十六进制模式的帮助下搜索存档文件,返回文件偏移量,将文件加载到内存中并返回文件地址。

To load a file into memory and return the address I currently use this:要将文件加载到内存中并返回我当前使用的地址:

DWORD LoadBinary(char* filePath)
{
    FILE *file = fopen(filePath, "rb");
    long fileStart = ftell(file);
    fseek(file, 0, SEEK_END);
    long fileSize = ftell(file);
    fseek(file, fileStart, 0);
    BYTE *fileBuffer = new BYTE[fileSize];
    fread(fileBuffer, fileSize, 1, file);
    LPVOID newmem = VirtualAlloc(NULL, fileSize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpy(newmem, fileBuffer, fileSize);
    delete[]fileBuffer;
    fclose(file);
    return (DWORD)newmem;
}

The archive is neither encrypted nor compressed, but it is pretty big (about 1 GB) and I'd like to not load the entire file into memory if possible.存档既未加密也未压缩,但它非常大(约 1 GB),如果可能,我不想将整个文件加载到内存中。

I'm aware of the size of the file I'm looking for inside the archive so I don't need the function to find the end of the file with another pattern.我知道我在存档中查找的文件的大小,所以我不需要该函数来查找具有另一种模式的文件末尾。

File Pattern: "\\x30\\x00\\x00\\x00\\xA0\\x10\\x04\\x00"文件模式:“\\x30\\x00\\x00\\x00\\xA0\\x10\\x04\\x00”

File Length: 4096 bytes文件长度:4096 字节

How can I realize this and what functions are needed?我怎样才能实现这一点以及需要哪些功能?

Solution解决方案

The code is probably slow for large files, but this works for me since the file I'm looking for is at the beginning of the archive.对于大文件,代码可能很慢,但这对我有用,因为我要查找的文件位于存档的开头。

FILE *file = fopen("C:/data.bin", "rb");
fseek(file, 0, SEEK_END);
long fileSize = ftell(file);
rewind(file);

BYTE *buffer = new BYTE[4];
int b = 0; //bytes read
long offset = 0;

for (int i = 0; i < fileSize; i++)
{
    int input = fgetc(file);

    *(int *)((DWORD)buffer + b) = input;

    if (b == 3)
    {
        b = 0;
    }
    else {
        b = b + 1;
    }

    if (buffer[0] == 0xDE & buffer[1] == 0xAD & buffer[2] == 0xBE & buffer[3] == 0xEF)
    {
        offset = (ftell(file) - 4);
        printf("Match @ 0x%08X", offset);
        break;
    }
}
fclose(file);

The principle is stated in this answer : you need a finite state machine (FSM) which takes file bytes one by one as input and compares current input with a byte from the pattern according to FSM state, which is an index in the pattern. 这个答案中说明了原理:您需要一个有限状态机(FSM),它将文件字节一个一个作为输入,并根据 FSM 状态将当前输入与模式中的字节进行比较,这是模式中的索引。

Here is the simplest, but naive solution template:这是最简单但朴素的解决方案模板:

FILE *file = fopen(path, "rb");
size_t state = 0;
for (int input_result; (input_result = fgetc(file)) != EOF;) {
    char input = (char)input_result;
    if (input == pattern[state]) {
        ++state;
    } else {
        state = 0;
    }
    if (pattern_index == pattern_size) {
        // Pattern is found at (ftell(file) - pattern_size).
        break;
    }
}
fclose(file);

The state variable holds position in the pattern, and it is the state of the FSM. state变量在模式中保持位置,它是 FSM 的状态。

While this solution satisfies your needs, it is not optimal, because reading a byte from a file takes nearly the same time as reading a bigger block of, say, 512 bytes or even more.虽然此解决方案满足您的需求,但它并不是最佳选择,因为从文件中读取一个字节与读取一个更大的块(例如 512 字节甚至更多)所花费的时间几乎相同。 You can improve this yourself in two steps:您可以通过两个步骤自行改进:

  1. Each iteration read a block, not a single character.每次迭代读取一个块,而不是单个字符。 Use fread() .使用fread() Note what calculation of pattern location (after it is found) becomes a bit more complicated, because ftell() no more matches the input location.注意模式位置的计算(在找到之后)变得有点复杂,因为ftell()不再匹配input位置。
  2. Add an inner loop to iterate through the block you've just read.添加一个内部循环来遍历您刚刚阅读的块。 Deal with input characters the same way as before—this is where FSM approach proves itself useful.像以前一样处理输入字符——这就是 FSM 方法证明自己有用的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM