简体   繁体   中英

How to quickly pick a random file from a folder-tree?

I'm trying to pick a random file from a folder tree, starting from a fixed path and "searching" recursively across all subfolders (or the choosen folder itself).

My idea is: make the list of files, calculate the number of files, choose a random number in this range and than pick the file at that index.

Here's my code:

// create list of all files
std::vector<std::string> paths;

for (const auto &entry : std::filesystem::recursive_directory_iterator(mPathDirectory)) {
    if (!std::filesystem::is_directory(entry)) {
        paths.push_back(entry.path().string());
    }
}

// pick random file
size_t numberOfFiles = paths.size();
int indexRandomFile = (int)round(rescale(random::uniform(), 0.0, 1.0, 0, numberOfFiles - 1));

return paths[indexRandomFile];

Also with O3 , its pretty slow, considering I've a huge list of files and I'm inside an "audio" application (which should be faster).

Do you have any smarter ideas? Somethings like O(1)? :P

Choosing a file uniformly at random this way can be done using the reservoir sampling technique. For each file, choose it at a 1/N chance, where N is the number of files you found so far, including the file just found. The random file is then the last file chosen this way.

See also this question for the similar task of choosing a random line from a text file; reservoir sampling applies, in general, whenever the number of items to choose from is not known in advance.


The following explains how reservoir sampling works:

  1. Set N to 1.
  2. Set ChosenFile to null.
  3. For each file:
    • If random::uniform() < 1.0 / N , set ChosenFile to the file's name.
    • Add 1 to N.

Now, ChosenFile is the randomly chosen file name.


Taking the code in your question, here is how reservoir sampling can be implemented. Note that no files are stored in a list anymore. Note also that this code is untested.

// store randomly chosen file
std::string path;
size_t n = 1;

for (const auto &entry: std::filesystem::recursive_directory_iterator(mPathDirectory)) {
    if (!std::filesystem::is_directory(entry)) {
        if (random::uniform() < 1.0 / n) {
           path = entry.path().string();
        }
        n++;
    }
}

return path;

If you know nothing about the folder structure, you have to recurse into it to find out how many items there are. There is no O(1) solution.

But an "app" needs to only feel fast, ie it's often only the perception of responsiveness that matters. To that end on first start you can employ heuristics, like recurse into some subfolders with a certain probability, until you find a file. It will not be uniformly random, but it will be relatively arbitrarily chosen from the user's standpoint.

Meanwhile you can really recurse into the folders and build up a cache , while the initially selected file is already playing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM