使用什么數據結構來讀取 C++ 中的 CSV 之類的文件？

Question

我的任務是使用程序參數中提供的表（.txt 文件）實現散列連接算法。

表文件可能如下所示：

c1:int,c2:int,c3:string,c4:long
1,1,asd,11
2,3,asdqwe,11

標題行，包含列名和列類型，然后是由“,”分隔的行。 列數和列的類型是可變的。

所以應該存儲列名和類型，我必須將它們存儲在二維數組或矩陣中。

我必須讀取類型並為指定類型的列制作一個容器。 我還必須將所有列一起存儲在異構數組或結構中。

我不知道 C++11 以下的任何 STL 容器是什么異構的。

如何確定列的類型，根據其類型創建容器並以異構方式存儲所有列數組？

我的努力：

 class Table {
public:
    struct Column {
        std::string name;
        std::string type;

        Column(std::string name, std::string type) : name(name), type(type) {}
    };

    //enum for switch-case
    static enum typeValue {
        INT,
        STRING,
        LONG,
        CHAR,
        DOUBLE,
        SHORT
    };


private:
    std::string filePath;
    std::ifstream file;
    std::vector<Column> columns;
    int rowNumber;
    std::vector<std::array> tableData;

public:
    Table(std::string filePath) {
        this->filePath = filePath;
        rowNumber = 0;
        open();
        loadTableData();


    }


    bool open() {

        file.open(filePath);
        std::string line;
        std::string delimiter = ",";
        size_t pos = 0;
        std::string token;
        getline(file, line);
        while ((pos = line.find(delimiter)) != std::string::npos) {     //parse the columns
            token = line.substr(0, pos);
            size_t posT = 0;
            posT = token.find(":");
            columns.push_back(Column(token.substr(0, posT), token.substr(posT + delimiter.length())));
            line.erase(0, pos + delimiter.length());
        }
        size_t posT = 0;
        posT = line.find(":");
        columns.push_back(Column(line.substr(0, posT), line.substr(posT + delimiter.length())));

        while (std::getline(file, line))    //count the rows
            ++rowNumber;

        for (int i = 0; i < columns.size(); ++i) {

            switch (hashIt(columns[i].type)) {
                case INT:
                    std::vector<int> *tempI = new std::vector<int>();
                    tableData[i] = tempI;
                    break;
                case STRING:
                    std::vector<std::string> *tempS = new std::vector<std::string>();
                    tableData[i] = tempS;
                    break;
                case LONG:
                    std::vector<long> *tempL = new std::vector<long>();
                    tableData[i] = tempL;
                    break;
                case CHAR:
                    std::vector<char> *tempC = new std::vector<char>();
                    tableData[i] = tempC;
                    break;
                case DOUBLE:
                    std::vector<double> *tempD = new std::vector<double>();
                    tableData[i] = tempD;
                    break;
                case SHORT:
                    std::vector<short> *tempSh = new std::vector<short>();
                    tableData[i] = tempSh;
                    break;
                default:
                    std::cerr << "Error: Unsupported column type.";
            }


        }

        //file.close();

        std::cout << "Table " << filePath << " has " << columns.size() << " columns." << std::endl;

        return true;


    }

    bool loadTableData() {
        file.seekg(0, file.beg);
        std::string line;
        getline(file, line); //discarding the column headers

        while (std::getline(file, line)) {
            ++rowNumber;        //count the rows

            for (int i = 0; i < columns.size(); ++i) {

                std::string delimiter = ",";
                size_t pos = line.find(delimiter);
                std::string token = line.substr(0, pos);
                tableData[i].push_back(token);


            }
        }

    }

    template<typename T>

    T **createColumnData(int colNum) {
        T *data = new T[rowNumber]();
        tableData.insert(colNum, data);
    }


    typeValue hashIt(std::string const &inString) {
        if (inString == "int") return INT;
        if (inString == "string") return STRING;
        if (inString == "long") return LONG;
        if (inString == "char") return CHAR;
        if (inString == "double") return DOUBLE;
        if (inString == "short") return SHORT;
        std::cerr << "Error: Unsupported column type.";


        return NULL;

    }


};

謝謝！

Answer 1

IMO，您應該將行建模為結構：

struct Row
{
  int column1;
  int column2;
  std::string name;
  int column3;
};

下一步是重載operator>>以讀取結構。

std::istream& operator>>(std::istream& inp, Row& r)
{
  char comma;
  inp >> r.column1;
  inp >> comma;
  inp >> r.column2;
  inp >> comma;
  std::getline(inp, r.name, ',');
  inp >> r.column3;
  inp.ignore(100000, '\n');
  return inp;
}

然后你可以讀取記錄：

std::vector<Row> database;
Row r;
while (input_file >> r)
{
  database.push_back(r);
}

也許您的下一步是創建索引表。 這將允許更快地搜索您的數據庫：

std::map<int, int> index_by_column1;
std::map<std::string, int> index_by_name;

該對的第二個值是關聯記錄的數據庫中的索引。

要查找包含name “Fred”的記錄：

std::map<std::string, int>::const_iterator iter;
iter = index_by_name.find("Fred");
Record r;
if (iter != index_by_name.end())
{
  int database_index = (*iter).second;
  r = database[database_index];
}

您可以使用將索引返回到數據庫中的哈希表，而不是使用std::map 。

使用什么數據結構來讀取 C++ 中的 CSV 之類的文件？

問題描述

1 個解決方案

解決方案1
0 已采納 2018-02-14 18:03:52

使用什么數據結構來讀取 C++ 中的 CSV 之類的文件？

問題描述

1 個解決方案

解決方案1 0 已采納 2018-02-14 18:03:52

解決方案1
0 已采納 2018-02-14 18:03:52