Detecting content change on a 16 GB pen drive within 8 sec

Question

I have to detect whether the playable media (audio, video and image) has changed on a 16GB pen drive with 30,000 files, within 8 seconds for subsequent insertions. Other files such as pdf or plain text are not to be considered; this is for a media player software.

I tried ls -l and md5 but it takes me 10-11 seconds. Has anyone ever solved this problem before or any strategy you can suggest?

The scenario when content can change is that the user may eject the pen drive, add more songs to it, and re-insert the same pen drive. If there is no content change then I can use the old database and thus save play-time.

I cannot rely on timestamps because renaming a file on a Windows system doesn't change the modification time.

Answer 1

Just check file sizes instead of md5 sums. This should be much faster and less resource-intensive.

Answer 2

I'm assume your hashing the output of ls here in order to trigger a hash change on renames, additions, size changes or timestamps (for the systems that do play nice), since I'm guessing hashing 16GB split over 30,000 files would take much longer than 11 seconds (although most of this advice should work either way)

Your probably going to end up having to write your own code using a lowerlevel API to access the file list. ls is designed to be human readable not for speed. You don't need to query the human readable perms, username, groups, and so on and your going to be incurring a memory copy by piping it to md5.

You could try using the find command which seems faster and can specify just files. It would still be less efficient than a real program without having a pipe. This one is non-recursive (but so is ls -l), you can also specify custom formatting output if you want more than the name:

find . -maxdepth 1 -type f | md5sum

You could also try an alternative hash to MD5. MD5 is a cryptographic hash, it's designed to be secure against deliberate malicious collisions but is slower as a result.

MurmurHash3 is one of the fastest or the newer xxhash . But it will depend on the hardware and size of the data (some hashes are optimized for small keys such as for a hashmap).

You could also try and thread it. Have one thread reading the list of files in from the drive continuously and another hashing them as fast as it can.

If your looking to do that with a standard shell however without writing your own code, it's going to be a pain.

Having said all that, your main bottleneck is probably the speed of the flash memory. All the tricks in the world won't help if your CPU is starved waiting for I/O. I'm not sure that it's a good 'challenge' as it will very a lot depending on drive manufacturer and USB version (unless that has been specified). But maybe doing all that might shave off a few seconds and bring you into your goal. Or just get a faster USB stick.

Detecting content change on a 16 GB pen drive within 8 sec

Question

2 answers

solution1
2 2014-09-11 06:08:13

solution2
1 2014-09-11 07:45:11

Detecting content change on a 16 GB pen drive within 8 sec

Question

2 answers

solution1 2 2014-09-11 06:08:13

solution2 1 2014-09-11 07:45:11

solution1
2 2014-09-11 06:08:13

solution2
1 2014-09-11 07:45:11