![](/img/trans.png)
[英]using getline() and then extracting strings from user input (using C++)
[英]C++ getline - Extracting a substring using regex
我有一個包含這樣內容的文件 -
Random text
+-------------------+------+-------+-----------+-------+
| Data | A | B | C | D |
+-------------------+------+-------+-----------+-------+
| Data 1 | 1403 | 0 | 2520 | 55.67 |
| Data 2 | 1365 | 2 | 2520 | 54.17 |
| Data 3 | 1 | 3 | 1234 | 43.12 |
Some more random text
我想提取行Data 1
D
列的值,即我想從上面的示例中提取值55.67
。 我正在使用getline逐行解析此文件 -
while(getline(inputFile1,line)) {
if(line.find("| Data 1") != string::npos) {
subString = //extract the desired value
}
如何從行中提取所需的子字符串。 有沒有辦法使用 boost::regex 來提取這個子字符串?
雖然regex
可能有它的用途,但它可能對此有點過分。
引入trim
功能和:
char delim;
std::string line, data;
int a, b, c;
double d;
while(std::getline(inputFile1, line)) {
std::istringstream is(line);
if( std::getline(is >> delim, data, '|') >>
a >> delim >> b >> delim >> c >> delim >> d >> delim)
{
trim(data);
if(data == "Data 1") {
std::cout << a << ' ' << b << ' ' << c << ' ' << d << '\n';
}
}
}
是的,可以很容易地使用正則表達式提取您的子字符串。 無需使用boost,您也可以使用現有的C++ regex 庫。
由此產生的程序非常簡單。
我們在一個簡單的 for 循環中讀取源文件的所有行。 然后我們使用std::regex_match
將剛剛讀取的行與我們的正則表達式相匹配。 如果我們找到了匹配項,那么結果將在std::smatch
sm,組 1 中。
並且因為我們將設計用於查找雙精度值的正則表達式,所以我們將得到我們所需要的,沒有任何額外的空格。
我們可以將其轉換為 double 並在屏幕上顯示結果。 並且因為我們定義了正則表達式來查找雙std::stod
,所以我們可以確定std::stod
會起作用。
生成的程序相當簡單易懂:
#include <iostream>
#include <string>
#include <sstream>
#include <regex>
// Please note. For std::getline, it does not matter, if we read from a
// std::istringstream or a std::ifstream. Both are std::istream's. And because
// we do not have files here on SO, we will use an istringstream as data source.
// If you want to read from a file later, simply create an std::ifstream inputFile1
// Source File with all data
std::istringstream inputFile1{ R"(
Random text
+-------------------+------+-------+-----------+-------+
| Data | A | B | C | D |
+-------------------+------+-------+-----------+-------+
| Data 1 | 1403 | 0 | 2520 | 55.67 |
| Data 2 | 1365 | 2 | 2520 | 54.17 |
| Data 3 | 1 | 3 | 1234 | 43.12 |
Some more random text)"
};
// Regex for finding the desired data
const std::regex re(R"(\|\s+Data 1\s+\|.*?\|.*?\|.*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|)");
int main() {
// The result will be in here
std::smatch sm;
// Read all lines of the source file
for (std::string line{}; std::getline(inputFile1, line);) {
// If we found our matching string
if (std::regex_match(line, sm, re)) {
// Then extract the column D info
double data1D = std::stod(sm[1]);
// And show it to the user.
std::cout << data1D << "\n";
}
}
}
對於大多數人來說,棘手的部分是如何定義正則表達式。 有像Online regex tester 和 debugger這樣的頁面。 還有正則表達式的細分和可以理解的解釋。
對於我們的正則表達式
\|\s+Data 1\s+\|.*?\|.*?\|.*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|
我們得到以下解釋:
\|
matches the character | literally (case sensitive)
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Data 1 matches the characters Data 1 literally (case sensitive)
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\|
matches the character | literally (case sensitive)
.*?
matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\|
matches the character | literally (case sensitive)
.*?
matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\|
matches the character | literally (case sensitive)
.*?
matches any character (except for line terminators)
\|
matches the character | literally (case sensitive)
\s*
matches any whitespace character (equal to [\r\n\t\f\v ])
1st Capturing Group ([-+]?[0-9]*\.?[0-9]+)
\s*
matches any whitespace character (equal to [\r\n\t\f\v ])
\|
matches the character | literally (case sensitive)
順便說一句,更安全(更安全匹配)的正則表達式是:
\|\s+Data 1\s+\|\s*?\d+\s*?\|\s*?\d+\s*?\|\s*?\d+\s*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.