簡體   English   中英

用元素混合物解析化學式

[英]parsing chemical formula with mixtures of elements

我想使用boost :: spirit來提取由粗糙配方中的幾種元素組成的化合物的化學計量。 在給定的化合物中,我的解析器應該能夠區分三種化學元素模式:

  • 天然元素由天然豐富的同位素混合物組成
  • 純同位素
  • 非天然豐度的同位素混合物

然后使用這些模式來解析以下化合物:

  • “C” - >由天然豐富的C [12]和C [13]制成的天然碳
  • “CH4” - >甲烷由天然碳和氫制成
  • “C2H {H [1](0.8)H [2](0.2)} 6” - >乙烷由天然C和非天然H制成,由80%的氫和20%的氘組成
  • “U [235]” - >純鈾235

顯然,化學元素模式可以是任何順序(例如CH [1] 4和H [1] 4C ......)和頻率。

我編寫的解析器非常接近完成工作,但我仍面臨一個問題。

這是我的代碼:

template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>>
{
    ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
    {

        namespace phx = boost::phoenix;

        // Semantic action for handling the case of pure isotope    
        phx::function<PureIsotopeBuilder> const build_pure_isotope = PureIsotopeBuilder();
        // Semantic action for handling the case of pure isotope mixture   
        phx::function<IsotopesMixtureBuilder> const build_isotopes_mixture = IsotopesMixtureBuilder();
        // Semantic action for handling the case of natural element   
        phx::function<NaturalElementBuilder> const build_natural_element = NaturalElementBuilder();

        phx::function<UpdateElement> const update_element = UpdateElement();

        // XML database that store all the isotopes of the periodical table
        ChemicalDatabaseManager<Isotope>* imgr=ChemicalDatabaseManager<Isotope>::Instance();
        const auto& isotopeDatabase=imgr->getDatabase();
        // Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
        for (const auto& isotope : isotopeDatabase) {
            _isotopeNames.add(isotope.second.getName(),isotope.second.getName());
            _elementSymbols.add(isotope.second.getProperty<std::string>("symbol"),isotope.second.getProperty<std::string>("symbol"));
        }

        _mixtureToken = "{" >> +(_isotopeNames >> "(" >> qi::double_ >> ")") >> "}";
        _isotopesMixtureToken = (_elementSymbols[qi::_a=qi::_1] >> _mixtureToken[qi::_b=qi::_1])[qi::_pass=build_isotopes_mixture(qi::_val,qi::_a,qi::_b)];

        _pureIsotopeToken = (_isotopeNames[qi::_a=qi::_1])[qi::_pass=build_pure_isotope(qi::_val,qi::_a)];
        _naturalElementToken = (_elementSymbols[qi::_a=qi::_1])[qi::_pass=build_natural_element(qi::_val,qi::_a)];

        _start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[qi::_a=qi::_1] >>
                      (qi::double_|qi::attr(1.0))[qi::_b=qi::_1])[qi::_pass=update_element(qi::_val,qi::_a,qi::_b)] );

    }

    //! Defines the rule for matching a prefix
    qi::symbols<char,std::string> _isotopeNames;
    qi::symbols<char,std::string> _elementSymbols;

    qi::rule<Iterator,isotopesMixture()> _mixtureToken;
    qi::rule<Iterator,isotopesMixture(),qi::locals<std::string,isotopesMixture>> _isotopesMixtureToken;

    qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _pureIsotopeToken;
    qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _naturalElementToken;

    qi::rule<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>> _start;
};

基本上,每個單獨的元素模式可以用它們各自的語義動作適當地解析,這產生了作為輸出的構建化合物的同位素和它們相應的化學計量之間的映射。 解析以下化合物時問題開始:

CH{H[1](0.9)H[2](0.4)}

在這種情況下,語義動作build_isotopes_mixture返回false,因為0.9 + 0.4對於比率之和是無意義的。 因此,我本來期望並希望我的解析器失敗了。 但是,由於_start規則對三種化學元素模式使用替代運算符,解析器設法通過1)丟棄{H[1](0.9)H[2](0.4)}第2部分來解析它保持前面的H 3)使用_naturalElementToken解析它。 我的語法不夠清晰,無法表達為解析器嗎? 如何以這樣一種方式使用替代運算符:當一個事件被發現在運行語義動作時給出false時,解析器會停止?

如何以這樣一種方式使用替代運算符:當一個事件被發現但在運行語義動作時給出錯誤時,解析器會停止?

通常,您可以通過添加期望點來防止回溯來實現此目的。

在這種情況下,您實際上是“混淆”了幾個任務:

  1. 匹配輸入
  2. 解釋匹配的輸入
  3. 驗證匹配的輸入

Spirit擅長匹配輸入,在解釋時具有很好的設施(主要是AST創建意義上的)。 然而,事情在飛行中驗證會變得“令人討厭”。

我經常重復的建議是盡可能考慮分離問題 我考慮一下

  1. 首先構建輸入的直接AST表示,
  2. 轉換/規范化/擴展/規范化到更方便或有意義的域表示
  3. 對結果做最后的驗證

這為您提供了最具表現力的代碼,同時保持高度可維護性。

因為我不能很好地理解問題域並且代碼示例不夠完整而無法引發它,所以我不會嘗試提供我想到的完整樣本。 相反,我會盡力勾畫我在開頭提到的期望點方法。

模擬樣本編譯

這花了最多的時間。 (考慮為那些要幫助你的人做腿部工作)

住在Coliru

#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <map>

namespace qi = boost::spirit::qi;

struct DummyBuilder {
    using result_type = bool;

    template <typename... Ts>
    bool operator()(Ts&&...) const { return true; }
};

struct PureIsotopeBuilder     : DummyBuilder {  };
struct IsotopesMixtureBuilder : DummyBuilder {  };
struct NaturalElementBuilder  : DummyBuilder {  };
struct UpdateElement          : DummyBuilder {  };

struct Isotope {
    std::string getName() const { return _name; }

    Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }

    template <typename T> std::string getProperty(std::string const& name) const {
        if (name == "symbol")
            return _symbol;
        throw std::domain_error("no such property (" + name + ")");
    }

  private:
    std::string _name, _symbol;
};

using MixComponent    = std::pair<Isotope, double>;
using isotopesMixture = std::list<MixComponent>;

template <typename Isotope>
struct ChemicalDatabaseManager {
    static ChemicalDatabaseManager* Instance() {
        static ChemicalDatabaseManager s_instance;
        return &s_instance;
    }

    auto& getDatabase() { return _db; }
  private:
    std::map<int, Isotope> _db {
        { 1, { "H[1]",   "H" } },
        { 2, { "H[2]",   "H" } },
        { 3, { "Carbon", "C" } },
        { 4, { "U[235]", "U" } },
    };
};

template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> >
{
    ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
    {
        using namespace qi;
        namespace phx = boost::phoenix;

        phx::function<PureIsotopeBuilder>     build_pure_isotope;     // Semantic action for handling the case of pure isotope
        phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
        phx::function<NaturalElementBuilder>  build_natural_element;  // Semantic action for handling the case of natural element
        phx::function<UpdateElement>          update_element;

        // XML database that store all the isotopes of the periodical table
        ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
        const auto& isotopeDatabase=imgr->getDatabase();

        // Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
        for (const auto& isotope : isotopeDatabase) {
            _isotopeNames.add(isotope.second.getName(),isotope.second.getName());
            _elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"),isotope.second.template getProperty<std::string>("symbol"));
        }

        _mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
        _isotopesMixtureToken = (_elementSymbols[_a=_1] >> _mixtureToken[_b=_1])[_pass=build_isotopes_mixture(_val,_a,_b)];

        _pureIsotopeToken     = (_isotopeNames[_a=_1])[_pass=build_pure_isotope(_val,_a)];
        _naturalElementToken  = (_elementSymbols[_a=_1])[_pass=build_natural_element(_val,_a)];

        _start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[_a=_1] >>
                    (double_|attr(1.0))[_b=_1]) [_pass=update_element(_val,_a,_b)] );
    }

  private:
    //! Defines the rule for matching a prefix
    qi::symbols<char, std::string> _isotopeNames;
    qi::symbols<char, std::string> _elementSymbols;

    qi::rule<Iterator, isotopesMixture()> _mixtureToken;
    qi::rule<Iterator, isotopesMixture(), qi::locals<std::string, isotopesMixture> > _isotopesMixtureToken;

    qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _pureIsotopeToken;
    qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _naturalElementToken;

    qi::rule<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> > _start;
};

int main() {
    using It = std::string::const_iterator;
    ChemicalFormulaParser<It> parser;
    for (std::string const input : {
            "C",                        // --> natural carbon made of C[12] and C[13] in natural abundance
            "CH4",                      // --> methane made of natural carbon and hydrogen
            "C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
            "C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
            "U[235]",                   // --> pure uranium 235
        })
    {
        std::cout << " ============= '" << input << "' ===========\n";
        It f = input.begin(), l = input.end();
        isotopesMixture mixture;
        bool ok = qi::parse(f, l, parser, mixture);

        if (ok)
            std::cout << "Parsed successfully\n";
        else
            std::cout << "Parse failure\n";

        if (f != l)
            std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
    }
}

這就是給定的,只是打印

 ============= 'C' ===========
Parsed successfully
 ============= 'CH4' ===========
Parsed successfully
 ============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
Parsed successfully
 ============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
Parsed successfully
 ============= 'U[235]' ===========
Parsed successfully

一般說明:

  1. 不需要本地人,只需使用常規占位符:

     _mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}"; _isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ]; _pureIsotopeToken = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ]; _naturalElementToken = _elementSymbols [ _pass=build_natural_element(_val, _1) ]; _start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >> (double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ] ); // .... qi::rule<Iterator, isotopesMixture()> _mixtureToken; qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken; qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken; qi::rule<Iterator, isotopesMixture()> _naturalElementToken; qi::rule<Iterator, isotopesMixture()> _start; 
  2. 你會想要處理名稱/符號之間的沖突(可能只是通過優先考慮一個或另一個)

  3. 符合編譯器將需要template限定符(除非我完全錯誤地猜測了您的數據結構,在這種情況下我不知道ChemicalDatabaseManager的模板參數應該是什么意思)。

    提示,MSVC不是符合標准的編譯器

住在Coliru

期望點素描

假設“權重”需要在_mixtureToken規則中加起來達到100%,我們可以使build_isotopes_micture “不是虛擬”並添加驗證:

struct IsotopesMixtureBuilder {
    bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
        using namespace boost::adaptors;

        // validate weights total only
        return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
    }
};

但是,正如您所指出的那樣,它會通過回溯來阻止事情發生。 相反,你可以/斷言/任何完整的混合物加起來為100%:

_mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));

有類似的東西

struct ValidateWeightTotal {
    bool operator()(isotopesMixture const& mixture) const {
        using namespace boost::adaptors;

        bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
        return ok;
        // or perhaps just :
        return ok? ok : throw InconsistentsWeights {};
    }

    struct InconsistentsWeights : virtual std::runtime_error {
        InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
    };
};

住在Coliru

#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/numeric.hpp>
#include <map>

namespace qi = boost::spirit::qi;

struct DummyBuilder {
    using result_type = bool;

    template <typename... Ts>
    bool operator()(Ts&&...) const { return true; }
};

struct PureIsotopeBuilder     : DummyBuilder {  };
struct NaturalElementBuilder  : DummyBuilder {  };
struct UpdateElement          : DummyBuilder {  };

struct Isotope {
    std::string getName() const { return _name; }

    Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }

    template <typename T> std::string getProperty(std::string const& name) const {
        if (name == "symbol")
            return _symbol;
        throw std::domain_error("no such property (" + name + ")");
    }

  private:
    std::string _name, _symbol;
};

using MixComponent    = std::pair<Isotope, double>;
using isotopesMixture = std::list<MixComponent>;

struct IsotopesMixtureBuilder {
    bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
        using namespace boost::adaptors;

        // validate weights total only
        return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
    }
};

struct ValidateWeightTotal {
    bool operator()(isotopesMixture const& mixture) const {
        using namespace boost::adaptors;

        bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
        return ok;
        // or perhaps just :
        return ok? ok : throw InconsistentsWeights {};
    }

    struct InconsistentsWeights : virtual std::runtime_error {
        InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
    };
};

template <typename Isotope>
struct ChemicalDatabaseManager {
    static ChemicalDatabaseManager* Instance() {
        static ChemicalDatabaseManager s_instance;
        return &s_instance;
    }

    auto& getDatabase() { return _db; }
  private:
    std::map<int, Isotope> _db {
        { 1, { "H[1]",   "H" } },
        { 2, { "H[2]",   "H" } },
        { 3, { "Carbon", "C" } },
        { 4, { "U[235]", "U" } },
    };
};

template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture()>
{
    ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
    {
        using namespace qi;
        namespace phx = boost::phoenix;

        phx::function<PureIsotopeBuilder>     build_pure_isotope;     // Semantic action for handling the case of pure isotope
        phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
        phx::function<NaturalElementBuilder>  build_natural_element;  // Semantic action for handling the case of natural element
        phx::function<UpdateElement>          update_element;
        phx::function<ValidateWeightTotal>    validate_weight_total;

        // XML database that store all the isotopes of the periodical table
        ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
        const auto& isotopeDatabase=imgr->getDatabase();

        // Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
        for (const auto& isotope : isotopeDatabase) {
            _isotopeNames.add(isotope.second.getName(),isotope.second.getName());
            _elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"), isotope.second.template getProperty<std::string>("symbol"));
        }

        _mixtureToken         = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
        _isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];

        _pureIsotopeToken     = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
        _naturalElementToken  = _elementSymbols [ _pass=build_natural_element(_val, _1) ];

        _start = +( 
                ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
                  (double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ] 
            );
    }

  private:
    //! Defines the rule for matching a prefix
    qi::symbols<char, std::string> _isotopeNames;
    qi::symbols<char, std::string> _elementSymbols;

    qi::rule<Iterator, isotopesMixture()> _mixtureToken;
    qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
    qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
    qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
    qi::rule<Iterator, isotopesMixture()> _start;
};

int main() {
    using It = std::string::const_iterator;
    ChemicalFormulaParser<It> parser;
    for (std::string const input : {
            "C",                        // --> natural carbon made of C[12] and C[13] in natural abundance
            "CH4",                      // --> methane made of natural carbon and hydrogen
            "C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
            "C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
            "U[235]",                   // --> pure uranium 235
        }) try 
    {
        std::cout << " ============= '" << input << "' ===========\n";
        It f = input.begin(), l = input.end();
        isotopesMixture mixture;
        bool ok = qi::parse(f, l, parser, mixture);

        if (ok)
            std::cout << "Parsed successfully\n";
        else
            std::cout << "Parse failure\n";

        if (f != l)
            std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
    } catch(std::exception const& e) {
        std::cout << "Caught exception '" << e.what() << "'\n";
    }
}

打印

 ============= 'C' ===========
Parsed successfully
 ============= 'CH4' ===========
Parsed successfully
 ============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
Parsed successfully
 ============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
Caught exception 'boost::spirit::qi::expectation_failure'
 ============= 'U[235]' ===========
Parsed successfully

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM