简体   繁体   English

如何提取字符串中的特定值

[英]How to extract specific values in a string

I need to extract specific values in a string stored from a file input. 我需要提取文件输入中存储的字符串中的特定值。 It has multiple delimiters and i cant figure out how to extract every specific value from it. 它有多个定界符,我无法弄清楚如何从中提取每个特定的值。

#include <vector>
#include <string>
#include <sstream>
#include <iostream>    
#include <fstream> 
using namespace std;


string ss = "[4, 90]-3-name";
// i need to extract the values 4, 90, 3 and name 
// the numbers can have multiple digits

stringstream tr(ss);
vector<string> result;

while( tr.good() )
{
    string substr;
    getline( ss, substr, '-' );
    result.push_back( substr );

}

for (int i = 0; i< result.size();i++)
    cout << result[i]<< endl;

output:
[4, 90]
3
name

If you know all the possible delimiters then you can replace each one in ss with a hyphen and then your code above will work. 如果知道所有可能的分隔符,则可以用连字符替换ss中的每个分隔符,然后上面的代码将起作用。 See link on the replace function http://www.cplusplus.com/reference/string/string/replace/ 请参阅替换功能上的链接http://www.cplusplus.com/reference/string/string/replace/

Paul's answer is clever, but maybe the string is read only. 保罗的答案很聪明,但也许字符串是只读的。 Here's a version that doesn't require modifying the string 这是不需要修改字符串的版本

int main()
{
    string ss = "[4, 90]-3-name"; // i need to extract the values 4, 90, 3     and name
    vector<string> results;
    size_t size = ss.size();
    size_t first = 0;
    size_t i = 0;
    while (i < size)
    {
        char ch = ss[i];
        if (ch == '[' || ch == ']' || ch == ' ' || ch == '-' || ch == ',') // delimiter check
        {
            if (i > first)
                results.push_back(ss.substr(first, i - first));
            first = i + 1;
        }
        ++i;
    }
    if (i > first)
        results.push_back(ss.substr(first, i - first));
    for (auto s : results)
        cout << s << '\n';
    return 0;
}

Hopefully that's reasonably clear. 希望这是合理的。 The trick is the first variable which tracks the index of the character we expect to be the first character of the next value to extract (ie one beyond the delimiter we've just found). 诀窍是第first变量,该变量跟踪我们希望成为要提取的下一个值的第一个字符的字符的索引(即,超出我们刚刚找到的定界符的字符)。 And the if (i > first) checks just make sure that we don't add any zero length strings to the results. if (i > first)检查只是确保我们没有在结果中添加任何零长度的字符串。

And now the C++ approach. 现在是C ++方法。 This is using Object Oriented idioms and modern C++ algorithms. 这是使用面向对象的习惯用法和现代的C ++算法。

We have data and methods which belong somehow together. 我们拥有某种程度上属于同一类的数据和方法。 For this there are classes (structs) in C++. 为此,在C ++中有一些类(结构)。 So you can define a class, with member variables and methods, which can work with the class varaibles. 因此,您可以定义一个具有成员变量和方法的类,该类可以与类变量一起使用。 Everything works as one object. 一切都作为一个对象。

Additionally. 另外。 The class knows, how to read or print its values. 该类知道如何读取或打印其值。 And only the class should know that. 而且只有班级应该知道这一点。 This wisdom is encapsulated. 这种智慧被封装。

And, next, we want to search interesting data embedded somewhere in a string. 接下来,我们要搜索嵌入在字符串中某处的有趣数据。 The string contains always a certain pattern. 该字符串始终包含特定模式。 In your case your have 3 integers and one string as interesting data and some delimiters in between, whatever they are. 在您的情况下,您有3个整数和一个字符串作为有趣的数据,以及介于两者之间的一些定界符,无论它们是什么。

To match such patterns and search for interesting parts of a string, C++ has std::regex . 为了匹配这种模式并搜索字符串中有趣的部分,C ++具有std::regex They are extremely powerful and hence a little bit complicated to define. 它们非常强大,因此定义起来有些复杂。

In the below example I will use const std::regex re(R"((\\d+).*?(\\d+).*?(\\d+).*?([\\w_]+))"); 在下面的示例中,我将使用const std::regex re(R"((\\d+).*?(\\d+).*?(\\d+).*?([\\w_]+))"); . This defines 4 groups of submatches (in brackets) and something in between. 这定义了4组子匹配项(在方括号中)以及介于两者之间的内容。 So any delimiter, space or whatever is possible. 因此,任何定界符,空间或任何可能的东西。

If you want to be more strict, you can simply change the pattern and you can detect errors in the source data. 如果您想更加严格,则只需更改模式即可检测源数据中的错误。 See const std::regex re(R"(\\[(\\d+)\\,\\ (\\d+)\\]\\-(\\d+)\\-([\\w_]+))"); 参见const std::regex re(R"(\\[(\\d+)\\,\\ (\\d+)\\]\\-(\\d+)\\-([\\w_]+))"); . This is a more strict approach. 这是一种更严格的方法。 The inputfile will not be read in case of error. 发生错误时将不会读取输入文件。 Or only the beginning with the valid data. 或仅以有效数据开头。

Please see below example: 请参见以下示例:

#include <string>
#include <regex>
#include <iterator>
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <ios>
#include <iomanip>

std::istringstream testFile{ R"([1, 1]-3-Big_City
  [1, 2] - 3 - Big_City
  [1, 3] - 3 - Big_City
  [2, 1] - 3 - Big_City
  [2, 2] - 3 - Big_City
  [2, 3] - 3 - Big_City
  [2, 7] - 2 - Mid_City
  [2, 8] - 2 - Mid_City
  [3, 1] - 3 - Big_City
  [3, 2] - 3 - Big_City
  [3, 3] - 3 - Big_City
  [3, 7] - 2 - Mid_City
  [3, 8] - 2 - Mid_City
  [7, 7] - 1 - Small_City)" };



const std::regex re(R"((\d+).*?(\d+).*?(\d+).*?([\w_]+))");


struct CityData
{
    // Define the city's data
    int xCoordinate{};
    int yCoordinate{};
    int cityId{};
    std::string cityName{};

    // Overload the extractor operator >> to read and parse a line
    friend std::istream& operator >> (std::istream& is, CityData& cd) {

        // We will read the line in this variable
        std::string line{};

        // Read the line and check, if it is OK
        if (std::getline(is, line)) {

            // Find the matched substrings
            std::smatch sm{};
            if (std::regex_search(line, sm, re)) {
                // An convert them to students record
                cd.xCoordinate = std::stoi(sm[1]);
                cd.yCoordinate = std::stoi(sm[2]);
                cd.cityId = std::stoi(sm[3]);
                cd.cityName = sm[3];
            }
            else {
                is.setstate(std::ios::failbit);
            }
        }
        return is;
    }

    friend std::ostream& operator << (std::ostream& os, const CityData& cd) {
        return os << cd.xCoordinate << ' ' << cd.yCoordinate << ' ' << cd.cityId;
    }
};

constexpr int MinimumArrayDimension = 8;

int main()
{
    // Define the variable cityData with the vectors range constructor. Read complete input file and parse data
    std::vector<CityData> cityData{ std::istream_iterator<CityData>(testFile),std::istream_iterator<CityData>() };

    // The following we are doing, because we want to print everything with the correct width
    // Read the maximum x coordinate
    const int maxRow = std::max(std::max_element (
        cityData.begin(), 
        cityData.end(), 
        [](const CityData & cd1, const CityData & cd2) { return cd1.xCoordinate < cd2.xCoordinate; }
    )->xCoordinate, MinimumArrayDimension);

    // Read the maximum y coordinate
    const unsigned int maxColumn = std::max(std::max_element(
        cityData.begin(),
        cityData.end(),
        [](const CityData & cd1, const CityData & cd2) { return cd1.yCoordinate < cd2.yCoordinate; }
    )-> yCoordinate, MinimumArrayDimension);

    // Read the maximum city
    const unsigned int maxCityID = std::max_element(
        cityData.begin(),
        cityData.end(),
        [](const CityData & cd1, const CityData & cd2) { return cd1.cityId < cd2.cityId; }
    )->cityId;

    // Get the number of digits that we have here
    const int digitSizeForRowNumber = maxRow > 0 ? (int)log10((double)maxRow) + 1 : 1;

    const int digitSizeForColumnNumber = std::max(maxColumn > 0 ? (int)log10((double)maxColumn) + 1 : 1,
                                                  maxCityID > 0 ? (int)log10((double)maxCityID) + 1 : 1);

    // Lambda function for printing the header and the footer
    auto printHeaderFooter = [&]() {
        std::cout << std::setw(digitSizeForColumnNumber) << "" << " #";
        for (int i = 0; i <= (maxColumn+1)* (digitSizeForColumnNumber+1); ++i)
            std::cout << '#';
        std::cout << "#\n";
    };


    // Print the complete map
    std::cout << "\n\n";
    printHeaderFooter();

    // Print all rows
    for (int row = maxRow; row >= 0; --row) {

        // Ptint the row number at the beginning of the line
        std::cout << std::setw(digitSizeForColumnNumber) << row << " # ";

        // Print all columns
        for (int col = 0; col <= maxColumn; ++col)
        {
            // Find the City ID for the given row (y) and column (x)
            std::vector<CityData>::iterator cdi = std::find_if(
                cityData.begin(),
                cityData.end(),
                [row, col](const CityData & cd) { return cd.yCoordinate == row && cd.xCoordinate == col; }
            );
            // If we could find nothing
            if (cdi == cityData.end()) {
                // Print empty space
                std::cout << std::setw(digitSizeForColumnNumber) << "" << ' ';
            }
            else {
                // Print the CityID
                std::cout << std::right << std::setw(digitSizeForColumnNumber) << cdi->cityId << ' ';
            }
        }
        // Print the end of the line
        std::cout <<  "#\n";
    }
    printHeaderFooter();
    // Print the column numbers
    std::cout << std::setw(digitSizeForColumnNumber) << "" << "   ";
    for (int col = 0; col <= maxColumn; ++col)
        std::cout << std::right << std::setw(digitSizeForColumnNumber) << col << ' ' ;
    // And, end
    std::cout << "\n\n\n";

    return 0;
}

Please note: main reads the file and displays the output. 请注意: main读取文件并显示输出。

And, because I cannot use file on SO, I read the data from "std::istringstream". 而且,由于无法在SO上使用文件,因此我从“ std :: istringstream”读取数据。 This is the same as reading from a file. 这与从文件读取相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM