我如何在 cpp 中解析這個文件？

Question

我想解析一個包含以下內容的文件：

2 300
abc12 130
bcd22 456
3 400
abfg12 230
bcpd22 46
abfrg2 13

這里，2是行數，300是權重。

每行都有一個字符串和一個數字（價格）。 與 3 和 400 相同。

我需要將 130, 456 存儲在一個數組中。

目前，我正在讀取文件，每一行都被處理為std::string 。 我需要幫助才能進一步發展。

代碼：

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

//void processString(string line);
void process2(string line);

int main(int argc, char ** argv) {
    cout << "You have entered " << argc <<
        " arguments:" << "\n";

    for (int i = 1; i < argc; ++i)
        cout << argv[i] << "\n";

    //2, 4 are the file names

    //Reading file - market price file
    string line;
    ifstream myfile(argv[2]);
    if (myfile.is_open()) {
        while (getline(myfile, line)) {
            //  cout << line << '\n';
        }
        myfile.close();
    } else cout << "Unable to open market price file";

    //Reading file - price list file
    string line_;
    ifstream myfile2(argv[4]);
    int c = 1;
    if (myfile2.is_open()) {
        while (getline(myfile2, line_)) {
            // processString(line_);
            process2(line_);
        }
        myfile2.close();
    } else cout << "Unable to open price lists file";

    //processString(line_);
    return 0;
}

void process2(string line) {

    string word = "";

    for (auto x: line) {
        if (x == ' ') {
            word += " ";
        } else {
            word = word + x;
        }
    }
    cout << word << endl;
}

是否有像 Java 中那樣的拆分功能，以便我可以將所有內容拆分並存儲為令牌？

Answer 1

您的帖子中有兩個問題：

我如何在 cpp 中解析這個文件？
是否有像 Java 中那樣的拆分功能，以便我可以將所有內容拆分並存儲為令牌？

我將回答這兩個問題並展示一個演示示例。

讓我們從將字符串拆分為標記開始。 有幾種可能性。 我們從簡單的開始。

由於字符串中的標記由空格分隔，因此我們可以利用提取運算符 (>>) 的功能。 這將從輸入流中讀取數據，直到空格，然后將此讀取的數據轉換為指定的變量。 您知道此操作可以鏈接。

然后對於示例字符串

    const std::string line{ "Token1 Token2 Token3 Token4" };

您可以簡單地將其放入std::istringstream ，然后從流中提取變量：

    std::istringstream iss1(line);
    iss1 >> subString1 >> subString2 >> subString3 >> subString4;

缺點是你需要寫很多東西，你必須知道字符串中元素的數量。

我們可以通過使用向量作為目標數據存儲並用其范圍構造函數填充它來克服這個問題。 向量范圍構造函數采用開始和結束迭代器並將數據復制到其中。

作為迭代器，我們使用std::istream_iterator 。 簡單來說，這將調用提取器運算符 (>>)，直到所有數據都被消耗掉。 無論我們將擁有多少數據。

這將如下所示：

    std::istringstream iss2(line);
    std::vector token(std::istream_iterator<std::string>(iss2), {});

這可能看起來很復雜，但事實並非如此。 我們定義了一個std::vector類型的變量“token”。 我們使用它的范圍構造函數。

而且，我們可以在沒有模板參數的情況下定義 std::vector。 編譯器可以從給定的函數參數中推導出參數。 此功能稱為 CTAD（“類模板參數推導”，需要 C++17）。

此外，您可以看到我沒有明確使用“end()”-迭代器。

該迭代器將從具有正確類型的空大括號括起來的默認初始化器構造，因為由於 std::vector 構造函數要求，它將被推導出與第一個參數的類型相同。

還有一個額外的解決方案。 這是最強大的解決方案，因此一開始可能有點復雜。

這樣可以避免使用 std::istringstream 並使用 std::sregex_token_iterator 直接將字符串轉換為標記。 使用起來非常簡單。 結果是一個用於拆分原始字符串的單行：

std::vector<std::string> token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});

因此，現代 C++ 具有內置功能，該功能專為標記字符串而設計。 它被稱為std::sregex_token_iterator 。 這是什么？

顧名思義，它是一個迭代器。 它將遍歷一個字符串（因此名稱中包含 's'）並返回拆分的標記。 令牌將再次匹配正則表達式。 或者，本機將匹配分隔符，其余部分將被視為令牌並返回。 這將通過其構造函數中的最后一個標志進行控制。

我們來看看這個構造函數：

token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});

第一個參數是它應該在源字符串中的哪里開始，第二個參數是結束位置，直到迭代器應該工作。 最后一個參數是：

1、如果你想正則匹配正則表達式
-1，將返回與正則表達式不匹配的所有內容

最后但並非最不重要的正則表達式本身。 請閱讀網絡 abot regex'es。 有大量可用的頁面。

請在此處查看所有 3 個解決方案的演示：

#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <sstream>
#include <iterator>
#include <algorithm>

/// Split string into tokens
int main() {

    // White space separated tokens in a string
    const std::string line{ "Token1 Token2 Token3 Token4" };

    // Solution 1: Use extractor operator ----------------------------------

    // Here, we will store the result
    std::string subString1{}, subString2{}, subString3{}, subString4{};

    // Put the line into an istringstream for easier extraction
    std::istringstream iss1(line);
    iss1 >> subString1 >> subString2 >> subString3 >> subString4;

    // Show result
    std::cout << "\nSolution 1:  Use inserter operator\n- Data: -\n" << subString1 << "\n"
        << subString2 << "\n" << subString3 << "\n" << subString4 << "\n";


    // Solution 2: Use istream_iterator ----------------------------------
    std::istringstream iss2(line);
    std::vector token(std::istream_iterator<std::string>(iss2), {});

    // Show result
    std::cout << "\nSolution 2:  Use istream_iterator\n- Data: -\n";
    std::copy(token.begin(), token.end(), std::ostream_iterator<std::string>(std::cout, "\n"));


    // Solution 3: Use std::sregex_token_iterator ----------------------------------
    const std::regex re(" ");

    std::vector<std::string> token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});

    // Show result
    std::cout << "\nSolution 3:  Use sregex_token_iterator\n- Data: -\n";
    std::copy(token2.begin(), token2.end(), std::ostream_iterator<std::string>(std::cout, "\n"));


    return 0;
}

所以，現在是關於如何讀取文本文件的答案。

創建正確的數據結構至關重要。 然后，覆蓋插入器和提取器運算符並將上述功能放入其中。

請參閱下面的演示示例。 當然還有很多其他可能的解決方案：

#include <string>
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

struct ItemAndPrice {
    // Data
    std::string item{};
    unsigned int price{};

    // Extractor
    friend std::istream& operator >> (std::istream& is, ItemAndPrice& iap) {

        // Read a complete line from the stream and check, if that worked
        if (std::string line{}; std::getline(is, line)) {

            // Read the item and price from that line and check, if that worked
            if (std::istringstream iss(line); !(iss >> iap.item >> iap.price))

                // There was an error, while reading item and price. Set failbit of input stream
                is.setf(std::ios::failbit);
        }
        return is;
    }

    // Inserter
    friend std::ostream& operator << (std::ostream& os, const ItemAndPrice& iap) {
        // Simple output of our internal data
        return os << iap.item << " " << iap.price;
    }
};

struct MarketPrice {
    // Data
    std::vector<ItemAndPrice> marketPriceData{};
    size_t numberOfElements() const { return marketPriceData.size(); }
    unsigned int weight{};

    // Extractor
    friend std::istream& operator >> (std::istream& is, MarketPrice& mp) {

        // Read a complete line from the stream and check, if that worked
        if (std::string line{}; std::getline(is, line)) {

            size_t numberOfEntries{};
            // Read the number of following entries and the weigth from that line and check, if that worked
            if (std::istringstream iss(line); (iss >> numberOfEntries >> mp.weight)) {

                mp.marketPriceData.clear();
                // Now copy the numberOfEntries next lines into our vector
                std::copy_n(std::istream_iterator<ItemAndPrice>(is), numberOfEntries, std::back_inserter(mp.marketPriceData));
            }
            else {
                // There was an error, while reading number of following entries and the weigth. Set failbit of input stream
                is.setf(std::ios::failbit);
            }
        }
        return is;
    };

    // Inserter
    friend std::ostream& operator << (std::ostream& os, const MarketPrice& mp) {

        // Simple output of our internal data
        os << "\nNumber of Elements: " << mp.numberOfElements() << "   Weight: " << mp.weight << "\n";

        // Now copy all marekt price data to output stream
        if (os) std::copy(mp.marketPriceData.begin(), mp.marketPriceData.end(), std::ostream_iterator<ItemAndPrice>(os, "\n"));

        return os;
    }
};

// For this example I do not use argv and argc and file streams. 
// This, because on Stackoverflow, I do not have files on Stackoverflow
// So, I put the file data in an istringstream. But for the below example, 
// there is no difference between a file stream or a string stream

std::istringstream sourceFile{R"(2 300
abc12 130
bcd22 456
3 400
abfg12 230
bcpd22 46
abfrg2 13)"};


int main() {

    // Here we will store all the resulting data
    // So, read the complete source file, parse the data and store result in vector
    std::vector mp(std::istream_iterator<MarketPrice>(sourceFile), {});

    // Now, all data are in mp. You may work with that now

    // Show result on display
    std::copy(mp.begin(), mp.end(), std::ostream_iterator<MarketPrice>(std::cout, "\n"));

    return 0;
}

我如何在 cpp 中解析這個文件？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-02-16 09:49:16

我如何在 cpp 中解析這個文件？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-02-16 09:49:16

解決方案1
1 已采納 2020-02-16 09:49:16