简体   繁体   English

c ++将字符串向量转换为给定.dat文件中的双精度向量

[英]c++ convert vector of strings to vector of doubles from a given .dat file

So I have many questions, but I'll start with what I believe should be an easy one. 所以我有很多问题,但我会从我认为应该是一个简单的问题开始。 I've been given an assignment to compare template files to query files, calculate the dot product, and return the 10 nearest neighbors. 我已经获得了将模板文件与查询文件进行比较,计算点积,并返回10个最近邻居的任务。 I think I can do the calculations fairly easily, but I'm having a hard time with the file i/o. 我想我可以很容易地进行计算,但是我对文件i / o很难。 I'm able to read in the data to a vector of strings, but I'm not sure how to convert it to a vector of doubles while maintaining the integrity of the vertex. 我能够将数据读入字符串向量,但我不确定如何将其转换为双精度向量,同时保持顶点的完整性。 If I try using a string stream, or an iterator, I end up getting each number assigned it's own index number, instead of each line getting it's own index number. 如果我尝试使用字符串流或迭代器,我最终会为每个数字分配它自己的索引号,而不是每行获得它自己的索引号。 Here's what I have...can you please help me? 这就是我的......你能帮助我吗?

Edited for clarification purposes: 编辑为澄清目的:

I am comparing query files to template files that contain collections of images to get the 10 nearest neighbors. 我将查询文件与包含图像集合的模板文件进行比较,以获得10个最近邻居。 A query file contains one "set" (for lack of a better description). 查询文件包含一个“集合”(缺少更好的描述)。 A template file contains 138 lines of data. 模板文件包含138行数据。 Right now, all I would like to do is print each line of data from the template file with it's corresponding index number from the .dat file, but in a format that allows me to do the necessary calculations. 现在,我想要做的就是从模板文件中打印每行数据,并使用相应的.dat文件索引号,但格式允许我进行必要的计算。 Once all is said and done, I will need to compute the cosine between two vectors (a query and the ith row in the template), so I will actually need to breakout the ith row of the template in order to compute the cosine between it and the query file. 一旦完成所有操作,我将需要计算两个向量之间的余弦(查询和模板中的第i行),因此我实际上需要突破模板的第i行以计算它之间的余弦和查询文件。 Is that more clear? 那更清楚了吗?

Here's a link to the query file: https://www.dropbox.com/s/6xytafmojrct3lh/001_AU01_query.dat?dl=0 Here's a link to the template file: https://www.dropbox.com/s/vnqi7h1btxdsf9u/001_template.dat?dl=0 以下是查询文件的链接: https//www.dropbox.com/s/6xytafmojrct3lh/001_AU01_query.dat?dl = 0以下是模板文件的链接: https//www.dropbox.com/s/vnqi7h1btxdsf9u /001_template.dat?dl=0

Sample output would be something like: "001_AU01_query: 15 20 135 19 36 22 105 95 55 68" where the numbers represent the line numbers of the corresponding template file that most closely match the query data. 样本输出类似于:“001_AU01_query:15 20 135 19 36 22 105 95 55 68”,其中数字表示与查询数据最匹配的相应模板文件的行号。

Again, I really appreciate your help. 再次,我非常感谢你的帮助。

void NearestNeighbor::readQuery(){
        vector<string> queryVector;
        string line;
            ifstream queryData;
        queryData.open("001_AU01_query.dat");
        if (queryData.fail()) {
            cout << "Unable to read query.dat file";
            exit(1);
        }
        //populate the vector with the template info
        while(getline(queryData, line, '\n')){
            queryVector.push_back(line);
        }
        //this prints the contents of the queryVector to the console
        for ( unsigned int i = 0; i < (queryVector.size()); i++){
            cout << "Index[" << i << "] " << queryVector[i] << endl;
        }
        queryData.close();
    }//end readQuery()

I'm happy to post a sample of the input and expected output, if you think it will help. 如果您认为有帮助,我很乐意发布输入和预期输出的样本。 Thanks in advance! 提前致谢!

You just need to choose the correct data structure and everything else should flow from that. 您只需要选择正确的数据结构,其他所有内容都应该从中流出。

A single 1-dimensional vector<double> will not work because you can't keep track of the values in each line of the data. 单个1维vector<double>将无法工作,因为您无法跟踪数据的每一行中的值。 However, a vector<double> is appropriate to store the values in a single line of the data. 但是, vector<double>适合将值存储在单行数据中。 Then you just need one of these for each line. 然后你只需要为每一行中的一个。

So a more appropriate data structure would be vector<vector<double>> . 因此更合适的数据结构是vector<vector<double>> ie a 2D vector: 即2D矢量:

void readQuery(std::istream& queryData){
  std::vector<std::vector<double>> queryVector;
  std::string line;

  while(getline(queryData, line, '\n'))
    queryVector.push_back(splitData(line));

  for (unsigned i = 0u; i != queryVector.size(); ++i) {
    std::cout << "Index[" << i << "] ";
    for(double value : queryVector[i])
        std::cout << value << " ";
    std::cout << "\n";
  }
}

This requires a function splitData to split a string into a vector<double> . 这需要一个函数splitDatastring拆分为vector<double> You can find plenty of examples (and debate) on the best way of splitting a string here on SO but an example implementation might be: 你可以在SO上找到关于分割字符串的最佳方法的大量例子(和辩论),但示例实现可能是:

std::vector<double> splitData(const std::string& line) {
  std::istringstream iss(line);
  std::istream_iterator<double> begin(iss);
  std::istream_iterator<double> end;
  return {begin, end};
}

Live demo Live demo C++03 现场演示 现场演示C ++ 03

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM