简体   繁体   English

C++ 读取 txt 的随机行?

[英]C++ reading random lines of txt?

I am running C++ code where I need to import data from txt file.我正在运行 C++ 代码,我需要从 txt 文件导入数据。 The text file contains 10,000 lines.文本文件包含 10,000 行。 Each line contains n columns of binary data.每行包含 n 列二进制数据。

The code has to loop 100,000 times, each time it has to randomly select a line out of the txt file and assign the binary values in the columns to some variables.代码必须循环 100,000 次,每次它必须从 txt 文件中随机 select 一行并将列中的二进制值分配给一些变量。

What is the most efficient way to write this code?编写此代码的最有效方法是什么? should I load the file first into the memory or should I randomly open a random line number?我应该先将文件加载到 memory 还是应该随机打开一个随机行号?

How can I implement this in C++?如何在 C++ 中实现这个?

To randomly access a line in a text file, all lines need to have the same byte-length.要随机访问文本文件中的一行,所有行都需要具有相同的字节长度。 If you don't have that, you need to loop until you get at the correct line.如果没有,则需要循环直到到达正确的行。 Since this will be pretty slow for so much access, better just load it into a std::vector of std::string s, each entry being one line (this is easily done with std::getline ).因为对于这么多的访问来说这会很慢,所以最好将它加载到std::stringstd::vector中,每个条目都是一行(这很容易用std::getline完成)。 Or since you want to assign values from the different columns, you can use a std::vector with your own struct like或者,由于您想从不同的列中分配值,您可以将std::vector与您自己的结构一起使用,例如

struct MyValues{
  double d;
  int i;
  // whatever you have / need
};

std::vector<MyValues> vec;

Which might be better instead of parsing the line all the time.这可能比一直解析行更好。

With the std::vector , you get your random access and only have to loop once through the whole file.使用std::vector ,您可以获得随机访问,并且只需遍历整个文件一次。

10K lines is a pretty small file. 10K 行是一个非常小的文件。 If you have, say, 100 chars per line, it will use the HUGE amount of 1MB of your RAM.例如,如果您每行有 100 个字符,它将使用大量 1MB 的 RAM。

Load it to a vector and access it the way you want.将其加载到vector并以您想要的方式访问它。

maybe not THE most efficient, but you could try this:也许不是最有效的,但你可以试试这个:

int main() {
    //use ifstream to read
    ifstream in("yourfile.txt");

    //string to store the line
    string line = "";

    //random number generator
    srand(time(NULL));

    for(int i = 0; i < 100000; i++) {
        in.seekg(rand() % 10000);
        in>>line;
        //do what you want with the line here...
    }
}

Im too lazy right now, but you need to make sure that you check your ifstream for errors like end-of-file, index-out-of-bounds, etc...我现在太懒了,但是您需要确保检查 ifstream 中是否存在文件结尾、索引越界等错误...

Since you're taking 100,000 samples from just 10,000 lines, the majority of lines will be sampled.由于您仅从 10,000 行中抽取 100,000 个样本,因此将对大多数行进行采样。 Read the entire file into an array data structure, and then randomly sample the array.将整个文件读入数组数据结构,然后随机采样数组。 This avoids file seeking entirely.这完全避免了文件搜索。

The more common case is to sample only a small subset of the file's data.更常见的情况是仅对文件数据的一小部分进行采样。 To do that, assuming the lines are different length, seek to random points in the file, skip to the next newline (for example cin.ignore( numeric_limits< streamsize >::max(), '\n' ) , and then parse the subsequent text.为此,假设行长度不同,寻找文件中的随机点,跳到下一个换行符(例如cin.ignore( numeric_limits< streamsize >::max(), '\n' ) ,然后解析随后的文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM