简体   繁体   中英

C++ Get the substring between custom delimiters without the use of regex

I've a simple string of format:

"lorem ipsum <span id='1'>extract_me-1</span> dolor
sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum
sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem"

And now i need to extract the strings between a custom delimiters specified

for example,

Substring("<span id='1'>","</span>") = extract_me-1
Substring("<span id='2'>","</span>") = extract_me-2
Substring("lorem","<span id='1'>") = ipsum
Substring("extract_me-1","dolor") = </span>

I've accomplished this task using regex:

std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";

std::smatch match;
std::regex rgx ("<span id='1'>(.*?)</span>");

if (regex_search(str, match, rgx)){
 //First substring
 std::cout<<match.str(1);
}

Is there any way to do this without the use of regex.. I've tried using substr a couple of times, but still no avail.. any help is highly appreciated, thnks

EDIT: the input str is not in a complete html format, just a bit of random tags.. and i just need the substring from start to next closest end position (yes, even when there is nested tags of same span or repetition)

You need to check each return value of each of the str.find() calls like I do for the first one but this is the gist of it. Might want to just search for the tag, then the id, but then you also need to check for non-existing id for for that tag:

#include <string>


int main() {
    const std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
    const std::string tag = "<span id='";
    std::string r = "";
    for(size_t pos = 0;;) {
        size_t tag_pos = str.find(tag, pos);
        if(tag_pos  == str.npos) {
            break;
        }
        size_t id_pos = tag_pos + tag.size();
        size_t id_pos2 = str.find("'", id_pos);
        size_t txt_pos = str.find(">", id_pos2) + 1;
        size_t txt_pos2 = str.find("<", txt_pos);

        r += "txt";
        r += str.substr(id_pos, id_pos2 - id_pos);
        r += " = ";
        r += str.substr(txt_pos, txt_pos2 - txt_pos);
        r += "\n";

        pos = txt_pos2;
    }
}

I was able to solve this using .find and .substr . It turned out to be easier than I thought

#include <string>
#include <iostream>

using namespace std;

int t1,t2;
string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";

string subStrng(string start,string end);

int main() {
    string txt1 = subStrng("<span id='1'>","</span>");
    string txt2 = subStrng("<span id='2'>","</span>");
    string txt3 = subStrng("<span id='3'>","</span>");

    cout<<txt1<<"\n"<<txt2<<"\n"<<txt3;
    return 0;
}


//Substring func.
string subStrng(string start,string end){
    t1=str.find(start);
    if(t1 >= 0){
        // string 'start' exist in str.
        // Now, lets find the next closest string 'end'
        t1=t1+start.length();
        t2=str.find(end,t1);
        if(t2 >= 0){
            // next closest 'end' exists in the str.
            // Now, lets extract the substring in between
            return str.substr(t1,t2-t1);

        }else{
            return "";
        }
    }else{
        return "";
    }
}

Cheers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM