正則表達式：查找所有子表達式（使用 boost::regex）

Question

我有一個文件，其中包含一些 Valve 格式的“實體”數據。 它基本上是一個鍵值交易，它看起來像這樣：

{
"world_maxs" "3432 4096 822"
"world_mins" "-2408 -4096 -571"
"skyname" "sky_alpinestorm_01"
"maxpropscreenwidth" "-1"
"detailvbsp" "detail_sawmill.vbsp"
"detailmaterial" "detail/detailsprites_sawmill"
"classname" "worldspawn"
"mapversion" "1371"
"hammerid" "1"
}
{
"origin" "553 -441 322"
"targetname" "tonemap_global"
"classname" "env_tonemap_controller"
"hammerid" "90580"
}

每對{}計為一個實體，其中的行計為 KeyValues。 如您所見，它相當簡單。

我想在 C++ 中將這些數據處理成vector<map<string, string> > 。 為此，我嘗試使用 Boost 附帶的正則表達式。 這是我到目前為止所擁有的：

static const boost::regex entityRegex("\\{(\\s*\"([A-Za-z0-9_]+)\"\\s*\"([^\"]+)\")+\\s*\\}");
boost::smatch what;
while (regex_search(entitiesString, what, entityRegex)) {
    cout << what[0] << endl;
    cout << what[1] << endl;
    cout << what[2] << endl;
    cout << what[3] << endl;
    break; // TODO
}

更易於閱讀的正則表達式：

\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}

我不確定正則表達式是否適合我的問題，但它似乎至少打印了最后一個鍵值對 (hammerid, 1)。

我的問題是，我將如何提取表達式中的“第 n 個”匹配子表達式？ 或者沒有真正可行的方法來做到這一點？ 編寫兩個嵌套的while循環是否會更好，一個搜索{}模式，然后一個搜索實際的鍵值對？

謝謝！

Answer 1

使用解析器生成器，您可以編寫正確的解析器。

例如，使用 Boost Spirit 您可以將內聯語法規則定義為 C++ 表達式：

    start  = *entity;
    entity = '{' >> *entry >> '}';
    entry  = text >> text;
    text   = '"' >> *~char_('"') >> '"';

這是一個完整的演示：

住在 Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>

using Entity    = std::map<std::string, std::string>;
using ValveData = std::vector<Entity>;

namespace qi = boost::spirit::qi;

template <typename It, typename Skipper = qi::space_type>
struct Grammar : qi::grammar<It, ValveData(), Skipper>
{
    Grammar() : Grammar::base_type(start) {
        using namespace qi;

        start  = *entity;
        entity = '{' >> *entry >> '}';
        entry  = text >> text;
        text   = '"' >> *~char_('"') >> '"';

        BOOST_SPIRIT_DEBUG_NODES((start)(entity)(entry)(text))
    }
  private:
    qi::rule<It, ValveData(),                           Skipper> start;
    qi::rule<It, Entity(),                              Skipper> entity;
    qi::rule<It, std::pair<std::string, std::string>(), Skipper> entry;
    qi::rule<It, std::string()>                                  text;
};

int main()
{
    using It = boost::spirit::istream_iterator;
    Grammar<It> parser;
    It f(std::cin >> std::noskipws), l;

    ValveData data;
    bool ok = qi::phrase_parse(f, l, parser, qi::space, data);

    if (ok) {
        std::cout << "Parsing success:\n";

        int count = 0;
        for(auto& entity : data)
        {
            ++count;
            for (auto& entry : entity)
                std::cout << "Entity " << count << ": [" << entry.first << "] -> [" << entry.second << "]\n";
        }
    } else {
        std::cout << "Parsing failed\n";
    }

    if (f!=l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}

哪個打印（對於顯示的輸入）：

Parsing success:
Entity 1: [classname] -> [worldspawn]
Entity 1: [detailmaterial] -> [detail/detailsprites_sawmill]
Entity 1: [detailvbsp] -> [detail_sawmill.vbsp]
Entity 1: [hammerid] -> [1]
Entity 1: [mapversion] -> [1371]
Entity 1: [maxpropscreenwidth] -> [-1]
Entity 1: [skyname] -> [sky_alpinestorm_01]
Entity 1: [world_maxs] -> [3432 4096 822]
Entity 1: [world_mins] -> [-2408 -4096 -571]
Entity 2: [classname] -> [env_tonemap_controller]
Entity 2: [hammerid] -> [90580]
Entity 2: [origin] -> [553 -441 322]
Entity 2: [targetname] -> [tonemap_global]

Answer 2

我認為用一個正則表達式做這一切很困難，因為每個實體{}的條目數量可變。 我個人會考慮使用簡單的std::readline來進行解析。

#include <map>
#include <vector>
#include <string>
#include <sstream>
#include <iostream>

std::istringstream iss(R"~(
    {
    "world_maxs" "3432 4096 822"
    "world_mins" "-2408 -4096 -571"
    "skyname" "sky_alpinestorm_01"
    "maxpropscreenwidth" "-1"
    "detailvbsp" "detail_sawmill.vbsp"
    "detailmaterial" "detail/detailsprites_sawmill"
    "classname" "worldspawn"
    "mapversion" "1371"
    "hammerid" "1"
    }
    {
    "origin" "553 -441 322"
    "targetname" "tonemap_global"
    "classname" "env_tonemap_controller"
    "hammerid" "90580"
    }
)~");

int main()
{
    std::string skip;
    std::string entity;

    std::vector<std::map<std::string, std::string> > vm;

    // skip to open brace, read entity until close brace
    while(std::getline(iss, skip, '{') && std::getline(iss, entity, '}'))
    {
        // turn entity into input stream
        std::istringstream iss(entity);

        // temporary map
        std::map<std::string, std::string> m;

        std::string key, val;

        // skip to open quote, read key to close quote
        while(std::getline(iss, skip, '"') && std::getline(iss, key, '"'))
        {
            // skip to open quote read val to close quote
            if(std::getline(iss, skip, '"') && std::getline(iss, val, '"'))
                m[key] = val;
        }

        // move map (no longer needed)
        vm.push_back(std::move(m));
    }

    for(auto& m: vm)
    {
        for(auto& p: m)
            std::cout << p.first << ": " << p.second << '\n';
        std::cout << '\n';
    }
}

輸出：

classname: worldspawn
detailmaterial: detail/detailsprites_sawmill
detailvbsp: detail_sawmill.vbsp
hammerid: 1
mapversion: 1371
maxpropscreenwidth: -1
skyname: sky_alpinestorm_01
world_maxs: 3432 4096 822
world_mins: -2408 -4096 -571

classname: env_tonemap_controller
hammerid: 90580
origin: 553 -441 322
targetname: tonemap_global

Answer 3

我會這樣寫：

^\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}$

或者將正則表達式拆分為兩個字符串。 首先匹配花括號，然后逐行遍歷花括號的內容。

匹配花括號： ^(\\{[^\\}]+)$匹配行： ^(\\s*"([A-Za-z0-9_]+)"\\s*"([^"]+)"\\s*)$

正則表達式：查找所有子表達式（使用 boost::regex）

問題描述

3 個解決方案

解決方案1
1 已采納 2015-06-11 13:27:58

解決方案2
1 2015-06-11 13:53:16

解決方案3
0 2015-06-11 13:41:51

正則表達式：查找所有子表達式（使用 boost::regex）

問題描述

3 個解決方案

解決方案1 1 已采納 2015-06-11 13:27:58

解決方案2 1 2015-06-11 13:53:16

解決方案3 0 2015-06-11 13:41:51

解決方案1
1 已采納 2015-06-11 13:27:58

解決方案2
1 2015-06-11 13:53:16

解決方案3
0 2015-06-11 13:41:51