简体   繁体   English

C ++-使用多重分隔符解析.txt文件,显示字符串

[英]C++ - Parsing a .txt-file with mulltiple delimiters, display string

everyone! 大家! I'm new to C++, alas I make silly mistakes. 我是C ++的新手,可惜我犯了一些愚蠢的错误。 This is a snippet of a .txt-file's content: 这是.txt文件内容的摘要:

<tag attr1="value1" attr2="value2" ... >

What I'm trying to accomplish is parsing through the .txt-file, generating the following output: 我要完成的工作是通过.txt文件进行解析,生成以下输出:

Tag: tag
name: attr1
value: value1
name: attr2
value: value2

What I've done so far didn't work (my problem is the delimiters): 到目前为止,我没有做过任何事情(我的问题是分隔符):

#include<iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>

using namespace std;

struct tagline{
string tag;
string attributeN;
string attributeV;

};

int main(){
vector<tagline> information;
string line;
tagline t;

ifstream readFile("file.txt");
    while(getline(readFile,line)){
    stringstream in(line);
    getline(in,t.tag);
    getline(in,t.attributeN,'=');
    getline(in,t.attributeV,'"');
    information.push_back(t);

}

vector<tagline>::iterator it = information.begin();

for(; it != information.end(); it++){
cout << "Tag: " << (*it).tag << " \n"
     << "name: " << (*it).attributeN << " \n"
     << "value: " << (*it).attributeV << " \n";

}
return 0;

}

All I get is a plain display of the snippet as it's formatted in the .txt-file: 我得到的只是该片段的纯文本显示,其格式为.txt文件:

<tag attr1="value1" attr2="value2" ... >

I would be happy if someone could help. 如果有人可以帮助我,我会很高兴。 Thank you! 谢谢!

This would be better handled using an HTML/XML parser (depending on what your file actually contains). 使用HTML / XML解析器(取决于文件实际包含的内容)会更好地解决此问题。

That being said, you are not parsing the lines correctly. 话虽这么说,您没有正确解析行。

Your first call to getline(in,t.tag); 您第一次拨打getline(in,t.tag); is not specifying a delimiter, so it reads the entire line, not just the first word. 没有指定分隔符,因此它读取整行,而不仅仅是第一个单词。 You would have to use getline(in, t.tag, ' '); 您将必须使用getline(in, t.tag, ' '); instead. 代替。

Also, your tags can have multiple attributes, but you are only reading and storing the first attribute, ignoring the rest. 同样,您的标签可以具有多个属性,但是您仅读取和存储第一个属性,而忽略其余属性。 You need a loop to read all of them, and a std::vector to store them all into. 您需要一个循环来读取所有内容,并需要一个std::vector来将它们全部存储到其中。

Try something more like this instead: 尝试类似这样的方法:

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>

using namespace std;

struct tagattribute {
    string name;
    string value;
};

struct tagline {
    string tag;
    vector<tagattribute> attributes;
};

int main() {
    vector<tagline> information;
    string line;

    ifstream readFile("file.txt");
    while (getline(readFile, line)) {
        istringstream in(line);

        tagline t;
        tagattribute attr;

        in >> ws;

        char ch = in.get();
        if (ch != '<')
            continue;

        if (!(in >> t.tag))
            continue;

        do
        {
            in >> ws;

            ch = in.peek();
            if (ch == '>')
                break;

            if (getline(in, attr.name, '=') &&
                in.ignore() &&
                getline(in, attr.value, '"'))
            {
                t.attributes.push_back(attr);
            }
            else
                break;
        }
        while (true);

        information.push_back(t);
    }

    vector<tagline>::iterator it = information.begin();
    for(; it != information.end(); ++it) {
        cout << "Tag: " << it->tag << "\n";

        vector<tagattribute>::iterator it2 = it->attributes.begin();
        for(; it2 != it->attributes.end(); ++it2) {
            cout << "name: " << it2->name << "\n"
            << "value: " << it2->value << "\n";
        }

        cout << "\n";
    }

    return 0;
}

Live demo 现场演示

Alternatively, consider writing some custom operator>> to help with the parsing, eg: 或者,考虑编写一些自定义operator>>来帮助进行解析,例如:

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>

using namespace std;

struct tagattribute {
    string name;
    string value;
};

istream& operator>>(istream &in, tagattribute &attr)
{
    getline(in, attr.name, '=');
    in.ignore();
    getline(in, attr.value, '"');
    return in;
}

struct tagline {
    string tag;
    vector<tagattribute> attributes;
};

istream& operator>>(istream &in, tagline &t)
{
    tagattribute attr;

    in >> ws;

    char ch = in.get();
    if (ch != '<')
    {
        in.setstate(ios_base::failbit);
        return in;
    }

    if (!(in >> t.tag))
        return in;

    do
    {
        in >> ws;

        ch = in.peek();
        if (ch == '>')
        {
            in.ignore();
            break;
        }

        if (!(in >> attr))
            break;

        t.attributes.push_back(attr);
    }
    while (true);

    return in;
}

int main() {
    vector<tagline> information;
    string line;

    ifstream readFile("file.txt");
    while (getline(readFile, line)) {
        istringstream in(line);
        tagline t;     

        if (in >> t)
            information.push_back(t);
    }

    vector<tagline>::iterator it = information.begin();
    for(; it != information.end(); ++it) {
        cout << "Tag: " << it->tag << "\n";

        vector<tagattribute>::iterator it2 = it->attributes.begin();
        for(; it2 != it->attributes.end(); ++it2) {
            cout << "name: " << it2->name << "\n"
            << "value: " << it2->value << "\n";
        }

        cout << "\n";
    }

    return 0;
}

Live demo 现场演示

Well, I would try to do something like this using this wonderful answer : 好吧,我将尝试使用以下出色的答案来做类似的事情:

struct xml_skipper : std::ctype<char> {
    xml_skipper() : ctype(make_table()) { }
private:
    static mask* make_table() {
        const mask* classic = classic_table();
        static std::vector<mask> v(classic, classic + table_size);
        v[','] |= space;
        v['"'] |= space;
        v['='] |= space;
        v['<'] |= space;
        v['>'] |= space;
        return &v[0];
    }
};

Then, what you can do is just keep reading: 然后,您可以做的就是继续阅读:

ifstream readFile("file.txt");
while(getline(readFile,line)){
    istringstream in(line);
    in.imbue(std::locale(in.getloc(), new xml_skipper));
    in >> t.tag >> t.attributeN >> t.attributeV;
    information.push_back(t);
}
//...

Do note that this will break if values or attribute names have whitespaces. 请注意,如果值或属性名称带有空格,则这将中断。


If you want something more serious, you will need to write lexer, syntax tree builder and semantics tree builder. 如果您想要更严肃的东西,则需要编写词法分析器,语法树构建器和语义树构建器。


Full code 完整代码

#include<iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>
#include <sstream>

using namespace std;

struct tagline{
    string tag;
    string attributeN;
    string attributeV;
};

struct xml_skipper : std::ctype<char> {
    xml_skipper() : ctype(make_table()) { }
private:
    static mask* make_table() {
        const mask* classic = classic_table();
        static std::vector<mask> v(classic, classic + table_size);
        v[','] |= space;
        v['"'] |= space;
        v['='] |= space;
        v['<'] |= space;
        v['>'] |= space;
        return &v[0];
    }
};

int main(){
    vector<tagline> information;
    string line;
    tagline t;
    std::istringstream readFile{"<tag attr1=\"value1\" attr2=\"value2\" ... >"};
    while(getline(readFile,line)){
        istringstream in(line);
        in.imbue(std::locale(in.getloc(), new xml_skipper));
        in >> t.tag >> t.attributeN >> t.attributeV;
        information.push_back(t);
    }


    vector<tagline>::iterator it = information.begin();

    for(; it != information.end(); it++){
        cout << "Tag: " << (*it).tag << " \n"
             << "name: " << (*it).attributeN << " \n"
             << "value: " << (*it).attributeV << " \n";
    }
}

Live on Wandbox . 在Wandbox上直播

If your input may vary within the boundaries of the xml specification, an XML-parser might be a better approach than parsing the string "manually". 如果您的输入可能在xml规范的范围内变化,则XML解析器可能比“手动”解析字符串更好。 Just to show how this could look like, see the following code. 只是为了显示它的外观,请参见以下代码。 It is based on tinyxml2 , which just requires to include a single .cpp / .h -file in your project. 它基于tinyxml2 ,它仅需在项目中包含单个.cpp / .h文件。 You could - of course - also use any other xml libraries; 当然,您也可以使用任何其他xml库。 this is just for demonstration purpose: 这只是出于演示目的:

#include <iostream>
#include "tinyxml2.h"
using namespace tinyxml2;

int main()
{
    const char* test = "<tag attr1='value1' attr2 = \"value2\"/>";
    XMLDocument doc;
    doc.Parse(test);
    XMLElement *root = doc.RootElement();
    if (root) {
        cout << "Tag: " << root->Name() << endl;
        const XMLAttribute *attrib = root->FirstAttribute();
        while (attrib) {
            cout << "name: " << attrib->Name() << endl;
            cout << "value : " << attrib->Value() << endl;
            attrib = attrib->Next();
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM