简体   繁体   English

如何从字符串中提取标记?

[英]How To Extract Tokens from String?

Edit: May someone add a regex solution?编辑:有人可以添加正则表达式解决方案吗? I was looking on the following regex:我正在查看以下正则表达式:

[\(\)!*-+^]

I had the a function that extracts tokens from a text according to the special chars I declared in its body.我有一个 function 根据我在正文中声明的特殊字符从文本中提取标记。

There are 2 problems in the function: function有2个问题:

1) It doesn't print the special chars. 1)它不打印特殊字符。

2) It outputs wrong when two special chars are next to each other 2)当两个特殊字符彼此相邻时输出错误

So I made a change, which fixed problem 1 (As I saw from the result of some tests) But doesn't fix number 2, any help?所以我做了一个改变,解决了问题 1(正如我从一些测试的结果中看到的)但没有解决问题 2,有什么帮助吗?

Note: I am using C++11 standard and not looking to use boost注意:我使用的是 C++11 标准,不打算使用 boost

Example: Given: a+(b*c) I am expecting: a,+,(,b,*,c,)示例:给定: a+(b*c)我期待: a,+,(,b,*,c,)

Given: a+b I am expecting: a,+,b给定: a+b我期待: a,+,b

Given: ab+ I am expecting: ab,+给定: ab+我期待: ab,+

Given: a b+ I am expecting: ab,+给定: a b+我期待: ab,+

You're not handling empty strings well.你没有很好地处理空字符串。 Before processing the empty string, check to make sure it isn't empty.在处理空字符串之前,请检查以确保它不为空。

#include <sstream>
#include <string>
#include <vector>
#include <cassert>

using std::string;
using std::stringstream;
using std::vector;

namespace {

// Parse a string into a vector of tokens by special characters.
auto find(string str) -> vector<string> {
    auto result = vector<string>{};
    auto pos = string::size_type{};
    auto last_pos = string::size_type{};

    while ((pos = str.find_first_of("+^-*!()", last_pos)) != string::npos) {
        // Is there a token before the special character?
        if (pos - last_pos > 0) {
            result.push_back(str.substr(last_pos, pos - last_pos));
        }

        last_pos = pos + 1;

        // Add the special character as a token.
        result.push_back(str.substr(pos, 1));
    }

    auto last = str.substr(last_pos);

    // Is there a trailing token after the last found special character?    
    if (!last.empty()) {
        result.push_back(str.substr(last_pos));
    }

    return result;
}

// Helper routine.
// Join a vector of strings together using a given separator.
auto join(vector<string> const& v, string sep) -> string {
    auto ss = stringstream{};
    auto first = true;

    for (auto const& s : v) {
        if (first) {
            ss << s;
            first = false;
        } else {
            ss << sep << s;
        }
    }

    return ss.str();
}

// Helper routine.
// Returns a string representing the tokenized string.
auto check(string s) -> string {
    auto v = find(s);
    auto result = join(v, ",");
    return result;
}

} // anon

int main() {
    // Unit tests to check that the string parses and tokenizes into the expected string.
    assert(check("a+(b*c)") == "a,+,(,b,*,c,)");
    assert(check("a+b") == "a,+,b");
    assert(check("ab+") == "ab,+");
    assert(check("a b+") == "a b,+");
    assert(check("a") == "a");
    assert(check("aa") == "aa");
    assert(check("+") == "+");
    assert(check("++") == "+,+");
    assert(check("a+") == "a,+");
    assert(check("+a") == "+,a");
    assert(check("") == "");
}

Here's a regex solution that should parse the tokens you want:这是一个应该解析您想要的令牌的正则表达式解决方案:

void find(std::string str)
{
    static const std::regex r(R"(\+|\^|-|\*|!|\(|\)|([\w|\s]+))");
    std::copy( std::sregex_token_iterator(str.begin(), str.end(), r, 0),
               std::sregex_token_iterator(),
               std::ostream_iterator<std::string>(std::cout, "\n"));
}

Here's a demo .这是一个演示

Here's an explanation .这里有一个解释

Note that this is not a good idea if you want to do general purpose parsing.请注意,如果您想进行通用解析,这不是一个好主意。 The regex will quickly become unwieldy (if it's not already), and there are much better tools available to do this for you.正则表达式将很快变得笨拙(如果还没有的话),并且有更好的工具可以为您执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM