简体   繁体   English

检查内存是否归零的最快方法

[英]fastest way to check if memory is zeroed

i got a program that needs to check if a chunk of a file is zeroed or has data. 我有一个程序,需要检查文件的大块是否归零或有数据。 This alg runs for the whole file for sizes upto a couple of gigs and takes a while to run. 这个alg运行整个文件的大小达到几个演出并需要一段时间才能运行。 Is there a better way to check to see if its zeroed? 有没有更好的方法来检查它是否归零?

Platform: Linux and windows 平台:Linux和Windows

bool WGTController::isBlockCompleted(wgBlock* block)
{
    if (!block)
        return false;

    uint32 bufSize = (uint32)block->size;
    uint64 fileSize = UTIL::FS::UTIL_getFileSize(m_szFile);

    if (fileSize < (block->size + block->fileOffset))
        return false;

    char* buffer = new char[bufSize];

    FHANDLE fh=NULL;

    try
    {
        fh = UTIL::FS::UTIL_openFile(m_szFile, UTIL::FS::FILE_READ);
        UTIL::FS::UTIL_seekFile(fh, block->fileOffset);
        UTIL::FS::UTIL_readFile(fh, buffer, bufSize);
        UTIL::FS::UTIL_closeFile(fh);
    }
    catch (gcException &)
    {
        SAFE_DELETEA(buffer);
        UTIL::FS::UTIL_closeFile(fh);
        return false;
    }

    bool res = false;

    for (uint32 x=0; x<bufSize; x++)
    {
        if (buffer[x] != 0)
        {
            res = true;
            break;
        }
    }

    SAFE_DELETEA(buffer);
    return res;
}

How long is 'a while'? 一段时间多久了? ... I'd say attempting to compare as many values in parallel as possible will help, maybe use some SIMD instructions to compare more than 4 bytes at a time? ...我想尝试比较尽可能多的并行值会有所帮助,也许可以使用一些SIMD指令一次比较4个以上的字节?

Do keep in mind though, that no matter how fast you make the comparison, ultimately the data still needs to be read from the file. 但请记住,无论您进行比较的速度有多快,最终仍需要从文件中读取数据。 If the file is not already in a cache somewhere in memory, then you may be limited to in the order of 100-150 MB/s at a maximum before the bandwidth of the disk is saturated. 如果文件尚未位于内存中的某个高速缓存中,则在磁盘带宽饱和之前,最大可能限制为100-150 MB / s。 If you have already hit this point, then you may first need to look at an approach that avoids having to load the file, or just accept the fact that it's not going to be faster than that. 如果您已经达到了这一点,那么您可能首先需要查看一种避免必须加载文件的方法,或者只是接受它不会比这更快的事实。

Are there places in the file/chunk where it is more likely to have non-zero values? 文件/块中是否存在更可能具有非零值的位置? You only have to find one non-zero value (your break condition), so look in places first where you most probably find them - which doesn't have to be the beginning of a file/chunk. 您只需找到一个非零值(您的中断条件),因此请先查看最有可能找到它们的位置 - 这不一定是文件/块的开头。 It might make sense to start at the end, or check the 1/3 in the middle, depending on the actual application. 根据实际应用,从最后开始或检查中间的1/3可能是有意义的。

However, I would not recommend to jump randomly to different positions; 但是,我不建议随机跳到不同的位置; reading from disk might become incredibly ;) .. 从磁盘读取可能会变得令人难以置信;)..

I'd like to see the assembly output for this function. 我想看看这个功能的汇编输出。 Something that you could do that would speed it up by a lot is to use SSE instructions. 你可以做的事情会加速很多,就是使用SSE指令。 With these instructions, you can load 8 bytes at a time, check them all for zero and continue. 使用这些指令,您可以一次加载8个字节,将它们全部检查为零并继续。 And you could unroll that for loop a few times too. 你也可以将它循环展开几次。

You algo seems OK, but you can heuristically optimise starting place if you known in advance what type of file you will be getting...but then again if it is a specific file, most likely the info will be in the header (first few bytes). 你的算法似乎没问题,但如果事先知道你会得到什么类型的文件,你可以尝试优化起始位置...但是如果它是一个特定的文件,那么很可能信息将在标题中(前几个)字节)。

Also make sure that block->size is not 1 from whoever calls the method :) 还要确保调用方法的人的block-> size不是1 :)

Also check out Boost's memory mapped files facilities...It might be of a help, depending on how you calculate the optimal block->size 另请查看Boost的内存映射文件工具......这可能会有所帮助,具体取决于您如何计算最佳块 - >大小

I have an "out of the box" answer for you, but I am not sure how feasable it is to implement in your situation. 我有一个“开箱即用”的答案,但我不确定在你的情况下实施是否可行。

If you don't control the dumping process: Since it is a large recovery (dump?) file produced on excpetional case, why not scan the file (for 0 bytes) on low priority right after it is dumped then and mark it somehow for faster later identification? 如果你不控制转储过程:因为它是一个在excpetional情况下产生的大型恢复(转储?)文件,为什么不在它被转储后立即扫描低优先级的文件(0字节)并以某种方式标记它以后更快识别? (or you could zip it and parse/scan the zip file later) (或者你可以压缩它并稍后解析/扫描zip文件)

Or if you control the dumping process: (a slow process you have to do anyways) why not indicate at the end of the dump file (or go back and write at the beginning of it), if the dump file is filled with 0 or has some valid data (since you wrote it and you know what is in it)? 或者,如果你控制转储过程:(一个你必须要做的慢进程)为什么不在转储文件的末尾指示(或者回去并在它的开头写入),如果转储文件用0填充或者有一些有效的数据(因为你写了它,你知道它里面有什么)? like that you don't have to pay for I/O overhead twice. 就像你不必支付I / O开销两倍。

The goal here being to make the reading much faster by deffering the procss to another eralier time, since when the dump happens, there is unlikely to be an operator waiting for it to load. 这里的目标是通过将procss设置为另一个更长的时间来使读取更快,因为当转储发生时,不太可能有操作员等待它加载。

First, don't allocate a new buffer each time. 首先,每次都不要分配新的缓冲区。 Allocate one (per thread) and reuse it. 分配一个(每个线程)并重用它。 Use a nice big chunk, and do multiple read/check passes. 使用一个漂亮的大块,并做多个读/检通行证。
Second, don't compare each character. 其次,不要比较每个角色。 Do comparisons on a larger integral type. 对较大的积分类型进行比较。 Most likely you will want a 32bit int, but depending on your os/compiler it might be faster to use a 64 or even 128 bit int. 很可能你会想要一个32位的int,但是根据你的os /编译器,使用64位甚至128位int可能会更快。 With a 32bit int you are reducing your number of comparisons by 4x. 使用32位int,您可以将比较次数减少4倍。 Your will, of course, have to worry about end conditions. 当然,您的意愿必须担心最终条件。 For this, it is easy, if the buffer you are comparing isn't a even multiple of your int size, just set the last X bytes to 0 before you do the compare. 为此,很容易,如果您要比较的缓冲区不是int大小的偶数倍,只需在执行比较之前将最后X个字节设置为0。 Third, it might help your compiler a bit to unroll the loop. 第三,它可能有助于你的编译器展开循环。 Do 4 or 8 comparisons in the body of the loop. 在循环体中进行4或8次比较。 This should help the compiler optimize a bit as well as reduce the number of comparisons for exiting the loop. 这应该有助于编译器优化一点,并减少退出循环的比较次数。 Make sure your buffer is a multiple of your comparison type x the number of comparisons in the loop. 确保缓冲区是比较类型的倍数x循环中的比较数。 Fourth, it may be faster to use (*pBuffer++) instead of buffer[i], especially if the buffer is big. 第四,使用(* pBuffer ++)代替buffer [i]可能更快,特别是如果缓冲区很大的话。

For any of these, you will of course, want to get some metrics and see what actually helps. 对于其中任何一个,您当然希望获得一些指标并查看实际有用的内容。

I will tell you a dirty, non-portable and difficult way, but than can be more efficient... If you are dealing with sparse files, you are really bored and want to mess with the internals of the filesystems you're using, you can try to add a new function that returns you a bitmap indicating which blocks are mapped and which ones aren't (those which are not mapped are zeroed, for the rest you will have still to check manually). 我会告诉你一个肮脏,不可移植和困难的方式,但可以更有效...如果你正在处理稀疏文件,你真的很无聊,想要搞乱你正在使用的文件系统的内部,您可以尝试添加一个新函数,该函数返回一个位图,指示哪些块被映射,哪些不是(未映射的那些被归零,其余的仍然需要手动检查)。

Yeah, I know it's crazy and nobody would ever like to do something like this xD 是的,我知道这很疯狂,没有人愿意做这样的事情

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM