简体   繁体   中英

Pick a random line from text file in C++

I'm trying to read a random line from a text file.

My code so far picks the first line, but I need a random line.

How would I get a random line?

string line;
if(infile.good()){
    getline(infile, line);
}

You can use the "Reservoir Sampling" approach as described in these related posts:

As we learn from the Wikipedia article on Reservoir Sampling:

Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn't fit into main memory.

Using such algorithm it is possible to pick random elements from a series of unknown length in a single pass, without the need to store all of them in memory.

Here's an (untested) example:

#include <cstdlib>
#include <iostream>
#include <random>
#include <string>
int main() {
    std::random_device seed;
    std::mt19937 prng(seed());
    std::string line, result;
    for(std::size_t n = 0; std::getline(std::cin, line); n++) {
        std::uniform_int_distribution<> dist(0, n);
        if (dist(prng) < 1)
            result = line;
    }
    std::cout << "random line: '" << result << "'\n";
}

Example output:

$ g++ test.cc -std=c++11 && ./a.out < test.cc
random line: '#include <iostream>'

For reference:

You can read in your file's lines into a std::vector<std::string> and randomly access a specific line within the range of the vectors size:

std::string line;
std::vector<std::string> lines;
while(getline(infile, line)) {
    lines.push_back(line);
}

if(lines.size() >= 4) {
     std::cout << "Line number 5: " << lines[4] << std::endl;
}

Another option is to set a random number first, and count the lines read:

int lineno = 5;
int linecount = 0;
std::string line;
while(getline(infile, line)) {
    ++linecount;
    if(linecount == lineno) {
        std::cout << "Line number " << lineno << ": " << line << std::endl;
    }
}

Call getline a random number of times and make sure to stop the loop if you reach end of file.

As long as you don't know how long the lines in your file are there's no way to somehow compute the beginning of any line (except the very first line of course) and seek directly to that point.

You can store the file offsets of the line beginnings into a std::vector . Next, generate your random number. Use the number as an index into the std::vector , and get the starting position of the line. Seek to that position and fetch the line.

std::vector<std::streampos> line_offsets;
line_offsets.push_back(0); // The first line.
std::string text_line;
while (getline(text_file, text_line))
{
  std::streampos file_offset = text_file.tellg();
  line_offsets.push_back(file_offset);
}
//...
std::streampos offset = line_offsets[Get_Random_Line_Number()];
text_file.seekg(offset);
std::string random_text_line;
getline(text_file, random_text_line);

This method doesn't use as much memory as storing each text line into a vector.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM