在8秒内检测16 GB笔式驱动器上的内容更改

Question

I have to detect whether the playable media (audio, video and image) has changed on a 16GB pen drive with 30,000 files, within 8 seconds for subsequent insertions. 我必须检测可播放的媒体（音频，视频和图像）是否已在具有30,000个文件的16GB笔式驱动器上更改，在后续插入的8秒内。 Other files such as pdf or plain text are not to be considered; 不考虑其他文件，如pdf或纯文本; this is for a media player software. 这是一个媒体播放器软件。

I tried ls -l and md5 but it takes me 10-11 seconds. 我试过ls -l和md5但它需要10-11秒。 Has anyone ever solved this problem before or any strategy you can suggest? 有没有人曾经解决过这个问题或者你可以建议的任何策略？

The scenario when content can change is that the user may eject the pen drive, add more songs to it, and re-insert the same pen drive. 内容可以更改的方案是用户可以弹出笔式驱动器，向其添加更多歌曲，然后重新插入相同的笔式驱动器。 If there is no content change then I can use the old database and thus save play-time. 如果没有内容更改，那么我可以使用旧数据库，从而节省播放时间。

I cannot rely on timestamps because renaming a file on a Windows system doesn't change the modification time. 我不能依赖时间戳，因为在Windows系统上重命名文件不会改变修改时间。

Answer 1

Just check file sizes instead of md5 sums. 只需检查文件大小而不是md5总和。 This should be much faster and less resource-intensive. 这应该更快，资源更少。

Answer 2

I'm assume your hashing the output of ls here in order to trigger a hash change on renames, additions, size changes or timestamps (for the systems that do play nice), since I'm guessing hashing 16GB split over 30,000 files would take much longer than 11 seconds (although most of this advice should work either way) 我假设您在这里散列ls的输出，以便在重命名，添加，大小更改或时间戳（对于确实很好的系统）上触发哈希更改，因为我猜测散列16GB分割超过30,000个文件需要超过11秒（虽然大多数建议应该以任何方式工作）

Your probably going to end up having to write your own code using a lowerlevel API to access the file list. 您可能最终必须使用较低级别的API编写自己的代码来访问文件列表。 ls is designed to be human readable not for speed. ls被设计成人类可读的而不是速度。 You don't need to query the human readable perms, username, groups, and so on and your going to be incurring a memory copy by piping it to md5. 您不需要查询人类可读的权限，用户名，组等，并且您将通过将其传输到md5来生成内存副本。

You could try using the find command which seems faster and can specify just files. 您可以尝试使用find命令，它看起来更快，并且只能指定文件。 It would still be less efficient than a real program without having a pipe. 如果没有管道，它仍然不如真正的程序有效。 This one is non-recursive (but so is ls -l), you can also specify custom formatting output if you want more than the name: 这个是非递归的（但是ls -l也是如此），如果你想要的不仅仅是名字，你还可以指定自定义格式输出：

find . -maxdepth 1 -type f | md5sum

You could also try an alternative hash to MD5. 您也可以尝试使用MD5的替代哈希。 MD5 is a cryptographic hash, it's designed to be secure against deliberate malicious collisions but is slower as a result. MD5是一种加密哈希，它的设计是为了抵御故意的恶意冲突，但结果却比较慢。

MurmurHash3 is one of the fastest or the newer xxhash . MurmurHash3是最快或更新的xxhash之一。 But it will depend on the hardware and size of the data (some hashes are optimized for small keys such as for a hashmap). 但它将取决于数据的硬件和大小（一些散列针对小键（例如散列映射）进行了优化）。

You could also try and thread it. 你也可以试一试它。 Have one thread reading the list of files in from the drive continuously and another hashing them as fast as it can. 让一个线程连续读取驱动器中的文件列表，另一个线程尽可能快地读取它们。

If your looking to do that with a standard shell however without writing your own code, it's going to be a pain. 如果您希望使用标准shell来执行此操作，但无需编写自己的代码，那将会非常痛苦。

Having said all that, your main bottleneck is probably the speed of the flash memory. 说了这么多，你的主要瓶颈可能就是闪存的速度。 All the tricks in the world won't help if your CPU is starved waiting for I/O. 如果您的CPU缺乏等待I / O，那么世界上所有的技巧都无济于事。 I'm not sure that it's a good 'challenge' as it will very a lot depending on drive manufacturer and USB version (unless that has been specified). 我不确定这是一个很好的“挑战”，因为它将取决于驱动器制造商和USB版本（除非已经指定）。 But maybe doing all that might shave off a few seconds and bring you into your goal. 但也许可以做一切可能会刮掉几秒钟并带你进入你的目标。 Or just get a faster USB stick. 或者只是获得更快的USB记忆棒。

在8秒内检测16 GB笔式驱动器上的内容更改

问题描述

2 个解决方案

解决方案1
2 2014-09-11 06:08:13

解决方案2
1 2014-09-11 07:45:11

在8秒内检测16 GB笔式驱动器上的内容更改

问题描述

2 个解决方案

解决方案1 2 2014-09-11 06:08:13

解决方案2 1 2014-09-11 07:45:11

解决方案1
2 2014-09-11 06:08:13

解决方案2
1 2014-09-11 07:45:11