简体   繁体   English

C++ 在不使用正则表达式的情况下获取自定义分隔符之间的 substring

[英]C++ Get the substring between custom delimiters without the use of regex

I've a simple string of format:我有一个简单的格式字符串:

"lorem ipsum <span id='1'>extract_me-1</span> dolor
sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum
sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem"

And now i need to extract the strings between a custom delimiters specified现在我需要提取指定的自定义分隔符之间的字符串

for example,例如,

Substring("<span id='1'>","</span>") = extract_me-1
Substring("<span id='2'>","</span>") = extract_me-2
Substring("lorem","<span id='1'>") = ipsum
Substring("extract_me-1","dolor") = </span>

I've accomplished this task using regex:我已经使用正则表达式完成了这项任务:

std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";

std::smatch match;
std::regex rgx ("<span id='1'>(.*?)</span>");

if (regex_search(str, match, rgx)){
 //First substring
 std::cout<<match.str(1);
}

Is there any way to do this without the use of regex.. I've tried using substr a couple of times, but still no avail.. any help is highly appreciated, thnks有什么方法可以在不使用正则表达式的情况下做到这一点。我尝试使用substr几次,但仍然无济于事。非常感谢任何帮助,谢谢

EDIT: the input str is not in a complete html format, just a bit of random tags.. and i just need the substring from start to next closest end position (yes, even when there is nested tags of same span or repetition)编辑:输入str不是完整的 html 格式,只是一些随机标签.. 我只需要从开始到下一个最近端 position的 substring (或span相同的重复)

You need to check each return value of each of the str.find() calls like I do for the first one but this is the gist of it.您需要检查每个str.find()调用的每个返回值,就像我对第一个调用所做的那样,但这是它的要点。 Might want to just search for the tag, then the id, but then you also need to check for non-existing id for for that tag:可能只想搜索标签,然后是 id,但您还需要检查该标签的不存在 id:

#include <string>


int main() {
    const std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
    const std::string tag = "<span id='";
    std::string r = "";
    for(size_t pos = 0;;) {
        size_t tag_pos = str.find(tag, pos);
        if(tag_pos  == str.npos) {
            break;
        }
        size_t id_pos = tag_pos + tag.size();
        size_t id_pos2 = str.find("'", id_pos);
        size_t txt_pos = str.find(">", id_pos2) + 1;
        size_t txt_pos2 = str.find("<", txt_pos);

        r += "txt";
        r += str.substr(id_pos, id_pos2 - id_pos);
        r += " = ";
        r += str.substr(txt_pos, txt_pos2 - txt_pos);
        r += "\n";

        pos = txt_pos2;
    }
}

I was able to solve this using .find and .substr .我能够使用.find.substr解决这个问题。 It turned out to be easier than I thought结果比我想象的要容易

#include <string>
#include <iostream>

using namespace std;

int t1,t2;
string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";

string subStrng(string start,string end);

int main() {
    string txt1 = subStrng("<span id='1'>","</span>");
    string txt2 = subStrng("<span id='2'>","</span>");
    string txt3 = subStrng("<span id='3'>","</span>");

    cout<<txt1<<"\n"<<txt2<<"\n"<<txt3;
    return 0;
}


//Substring func.
string subStrng(string start,string end){
    t1=str.find(start);
    if(t1 >= 0){
        // string 'start' exist in str.
        // Now, lets find the next closest string 'end'
        t1=t1+start.length();
        t2=str.find(end,t1);
        if(t2 >= 0){
            // next closest 'end' exists in the str.
            // Now, lets extract the substring in between
            return str.substr(t1,t2-t1);

        }else{
            return "";
        }
    }else{
        return "";
    }
}

Cheers干杯

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM