简体   繁体   English

如何将 CSV 数据读取到 C++ 中结构向量的指针?

[英]How to read CSV data to pointers of struct vector in C++?

I want to read a csv data to vector of struct in cpp, This is what I wrote, I want to store the iris dataset in pointer of struct vector csv std::vector<Csv> *csv = new std::vector<Csv>;我想在 cpp 中将 csv 数据读取到结构向量,这是我写的,我想将虹膜数据集存储在结构向量 csv std::vector<Csv> *csv = new std::vector<Csv>;的指针中std::vector<Csv> *csv = new std::vector<Csv>;

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

struct Csv{
    float a;
    float b;
    float c;
    float d;
    std::string e;
};

int main(){
    std::string colname;
    
    // Iris csv dataset downloaded from
    // https://gist.github.com/curran/a08a1080b88344b0c8a7
    std::ifstream *myFile = new std::ifstream("iris.csv");
    

    std::vector<Csv> *csv = new std::vector<Csv>;
    
    std::string line;
    
    // Read the column names
    if(myFile->good())
    {
        // Extract the first line in the file
        std::getline(*myFile, line);

        // Create a stringstream from line
        std::stringstream ss(line);

        // Extract each column name
        while(std::getline(ss, colname, ',')){
            
            std::cout<<colname<<std::endl;
            }
    }
    

   // Read data, line by line
    while(std::getline(*myFile, line))
    {
        // Create a stringstream of the current line
        std::stringstream ss(line);

        
    }
        
    return 0;
}

I dont know how to implement this part of the code which outputs line with both float and string.我不知道如何实现这部分代码,它输出带有浮点数和字符串的行。

   // Read data, line by line
    while(std::getline(*myFile, line))
    {
        // Create a stringstream of the current line
        std::stringstream ss(line);

        
    }

Evolution进化

We start with you program and complete it with your current programm style.我们从您的程序开始,并以您当前的程序风格完成它。 Then we analyze your code and refactor it to a more C++ style solution.然后我们分析您的代码并将其重构为更具 C++ 风格的解决方案。 In the end we show a modern C++ solution using more OO methods.最后,我们展示了一个使用更多 OO 方法的现代 C++ 解决方案。

First your completed code:首先你完成的代码:

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

struct Csv {
    float a;
    float b;
    float c;
    float d;
    std::string e;
};

int main() {
    std::string colname;

    // Iris csv dataset downloaded from
    // https://gist.github.com/curran/a08a1080b88344b0c8a7
    std::ifstream* myFile = new std::ifstream("r:\\iris.csv");


    std::vector<Csv>* csv = new std::vector<Csv>;

    std::string line;

    // Read the column names
    if (myFile->good())
    {
        // Extract the first line in the file
        std::getline(*myFile, line);

        // Create a stringstream from line
        std::stringstream ss(line);

        // Extract each column name
        while (std::getline(ss, colname, ',')) {

            std::cout << colname << std::endl;
        }
    }


    // Read data, line by line
    while (std::getline(*myFile, line))
    {
        // Create a stringstream of the current line
        std::stringstream ss(line);
        // Extract each column 
        std::string column;
        std::vector<std::string> columns{};

        while (std::getline(ss, column, ',')) {
            columns.push_back(column);
        }
        // Convert
        Csv csvTemp{};
        csvTemp.a = std::stod(columns[0]);
        csvTemp.b = std::stod(columns[1]);
        csvTemp.c = std::stod(columns[2]);
        csvTemp.d = std::stod(columns[3]);
        csvTemp.e = columns[4];
        // STore new row data
        csv->push_back(csvTemp);
    }
    // Show everything
    for (const Csv& row : *csv)
        std::cout << row.a << '\t' << row.b << '\t' << row.c << '\t' << row.d << '\t' << row.e << '\n';


    return 0;
}

The question that you have regarding the reading of the columns from your Csv file, can be answered like that:关于从 Csv 文件中读取列的问题,可以这样回答:

You need a temporary vector.你需要一个临时向量。 Then you use the std::getline function, to split the data in the std::istringstream and to copy the resulting substrings into the vector.然后使用std::getline function 来拆分std::istringstream中的数据并将生成的子字符串复制到向量中。 After that, we use string conversion functions and assign the rsults in a temporary Csv struct variable.之后,我们使用字符串转换函数并将结果分配到临时 Csv 结构变量中。 After all conversions have been done, we move the temporary into the resulting csv vector that holds all row data.完成所有转换后,我们将临时数据移动到生成的 csv 向量中,该向量包含所有行数据。


Analysis of the program.方案分析。

First, and most important, in C++ we do not use raw pointers for owned memory.首先,也是最重要的,在 C++ 中,我们不对拥有的 memory 使用原始指针。 We should ven not use new in most case.在大多数情况下,我们甚至不应该使用new If at all, std::unique_ptr and std::make_unique should be used.如果有的话,应该使用std::unique_ptrstd::make_unique

But we do not need dynamic memory allocation on the heap at all.但是我们根本不需要堆上的动态 memory 分配。 You can simply define the std::vector on the functions stack.您可以简单地在函数堆栈上定义std::vector Same like in your line std::string colname;与您的std::string colname;行相同you can also define the std::vector and the std::ifstream as a normal local variable.您还可以将std::vectorstd::ifstream定义为普通的局部变量。 Like for example std::vector<Csv> csv{};例如std::vector<Csv> csv{}; . . Only, if you pass this variable to another function, then use pointers, but smart pointers.只是,如果您将此变量传递给另一个 function,则使用指针,但使用智能指针。

Next, if you open a file, like in std::ifstream myFile("r:\\iris.csv");接下来,如果你打开一个文件,比如在std::ifstream myFile("r:\\iris.csv"); you do not need to test the file streams condition with if (myFile->good()) .您不需要使用if (myFile->good())测试文件流条件。 The std::fstream s bool operator is overwritten, to give you exactly this information. std::fstream的 bool 运算符被覆盖,为您提供准确的信息。 Please see here .请看这里

Now, next and most important.现在,接下来也是最重要的。

The structure of your source file is well known.您的源文件的结构是众所周知的。 There is a header with 5 elements and then 4 doubles and at then end a string without spaces.有一个 header 有 5 个元素,然后是 4 个双精度数,然后结束一个没有空格的字符串。 This makes life very easy.这让生活变得非常轻松。

If we would need to validate the input or if there would be spaces within an string, then we would need to implement other methods.如果我们需要验证输入或者字符串中是否有空格,那么我们需要实现其他方法。 But with this structure, we can use the build in iostream facilities.但是有了这个结构,我们就可以使用内置的 iostream 设施。 The snippet片段

        // Read all data
        Csv tmp{};
        char comma;
        while (myFile >> tmp.a >> comma >> tmp.b >> comma >> tmp.c >> comma >> tmp.d >> comma >> tmp.e)
            csv.push_back(std::move(tmp));

will do the trick.会成功的。 Very simple.很简单。

So, the refactored solution could look like this:因此,重构后的解决方案可能如下所示:

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

struct Csv {
    float a;
    float b;
    float c;
    float d;
    std::string e;
};

int main() {

    std::vector<Csv> csv{};
    std::ifstream myFile("r:\\iris.csv");
    if (myFile) {
        
        if (std::string header{}; std::getline(myFile, header)) std::cout << header << '\n';

        // Read all data
        Csv tmp{};
        char comma;
        while (myFile >> tmp.a >> comma >> tmp.b >> comma >> tmp.c >> comma >> tmp.d >> comma >> tmp.e)
            csv.push_back(std::move(tmp));

        // Show everything
        for (const Csv& row : csv)
            std::cout << row.a << '\t' << row.b << '\t' << row.c << '\t' << row.d << '\t' << row.e << '\n';
    }
    return 0;
}

This is already much more compact.这已经更加紧凑了。 But there is more.但还有更多。 . . . .


In the next step, we want to add a more Object Oriented approch.在下一步中,我们要添加更多面向 Object 的方法。

The key is that data and methods, operating on this data, should be encapsulated in an Object / class / struct.关键是对这些数据进行操作的数据和方法应该封装在一个 Object / class / 结构中。 Only the Csv struct should know, how to read and write its data.只有 Csv 结构应该知道如何读取和写入其数据。

Hence, we overwrite the extractor and inserter operator for the Csv struct.因此,我们覆盖了 Csv 结构的提取器和插入器运算符。 We use the same approach than before.我们使用与以前相同的方法。 We just encapsulate the reading and writing in the struct Csv.我们只是将读写封装在struct Csv中。

After that, the main function will be even more compact and the usage is more logical.之后,主要的function会更加紧凑,使用更加合乎逻辑。

Now we have:现在我们有:

#include <vector>
#include <iostream>
#include <fstream>
#include <string>

struct Csv {
    float a;
    float b;
    float c;
    float d;
    std::string e;

    friend std::istream& operator >> (std::istream& is, Csv& c) {
        char comma;
        return is >> c.a >> comma >> c.b >> comma >> c.c >> comma >> c.d >> comma >> c.e;
    }

    friend std::ostream& operator << (std::ostream& os, const Csv& c) {
        return os << c.a << '\t' << c.b << '\t' << c.c << '\t' << c.d << '\t' << c.e << '\n';
    }
};

int main() {

    std::vector<Csv> csv{};
    if (std::ifstream myFileStream("r:\\iris.csv"); myFileStream) {

        if (std::string header{}; std::getline(myFileStream, header)) std::cout << header << '\n';

        // Read all data
        Csv tmp{};
        while (myFileStream >> tmp)
            csv.push_back(std::move(tmp));

        // Show everything
        for (const Csv& row : csv)
            std::cout << row;
    }
    return 0;
}

OK.好的。 Alread rather good.已经相当不错了。 Bit there is even more possible.比特还有更多可能。


We can see that the source data has a header and then Csv data.我们可以看到源数据有一个 header 和 Csv 数据。

Also this can be modelled into a struct.这也可以建模成一个结构。 We call it Iris.我们称之为鸢尾花。 And we also add an extractor and inserter overwrite to encapsulate all IO-operations.我们还添加了一个提取器和插入器覆盖来封装所有 IO 操作。

Additionally we use now modern algorithms, regex, and IO-iterators.此外,我们现在使用现代算法、正则表达式和 IO 迭代器。 I am not sure, if this is too complex now.我不确定,如果现在这太复杂了。 If you are interested, then I can give you further information.如果你有兴趣,那么我可以给你更多的信息。 But for now, I will just show you the code.但是现在,我将只向您展示代码。

#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <regex>
#include <iterator>

const std::regex re{ "," };

struct Csv {
    float a;
    float b;
    float c;
    float d;
    std::string e;
    // Overwrite extratcor for simple reading of data
    friend std::istream& operator >> (std::istream& is, Csv& c) {
        char comma;
        return is >> c.a >> comma >> c.b >> comma >> c.c >> comma >> c.d >> comma >> c.e;
    }
    // Ultra simple inserter
    friend std::ostream& operator << (std::ostream& os, const Csv& c) {
        return os << c.a << "\t\t" << c.b << "\t\t" << c.c << "\t\t" << c.d << "\t\t" << c.e << '\n';
    }
};

struct Iris {
    // Iris data consits of header and then Csv Data
    std::vector<std::string> header{};
    std::vector<Csv> csv{};

    // Overwrite extractor for generic reading from streams
    friend std::istream& operator >> (std::istream& is, Iris& i) {
        // First read header values;
        if (std::string line{}; std::getline(is, line)) 
            std::copy(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {}, std::back_inserter(i.header));
        
        // Read all csv data
        std::copy(std::istream_iterator<Csv>(is), {}, std::back_inserter(i.csv));
        return is;
    }

    // Simple output. Copy data to stream os
    friend std::ostream& operator << (std::ostream& os, const Iris& i) {
        std::copy(i.header.begin(), i.header.end(), std::ostream_iterator<std::string>(os, "\t")); std::cout << '\n';
        std::copy(i.csv.begin(), i.csv.end(), std::ostream_iterator<Csv>(os));
        return os;
    }
};


// Driver Code
int main() {

    if (std::ifstream myFileStream("r:\\iris.csv"); myFileStream) {

        Iris iris{};

        // Read all data
        myFileStream >> iris;

        // SHow result 
        std::cout << iris;
    }
    return 0;
}

Look at the main function and how easy it is.看看主要的 function 和它是多么容易。

If you have questions, then please ask.如果您有任何疑问,请询问。


Language: C++17语言:C++17

Compiled and tested with MS Visual Studio 2019, community edition使用 MS Visual Studio 2019 社区版编译和测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM