简体   繁体   English

使用std :: regex_iterator <std::string::iterator> 根据CPlusPlus.com

[英]Use of std::regex_iterator<std::string::iterator> according to CPlusPlus.com

I'm reading the documentation on std::regex_iterator<std::string::iterator> since I'm trying to learn how to use it for parsing HTML tags. 我正在阅读std::regex_iterator<std::string::iterator>上的文档,因为我试图学习如何使用它来解析HTML标签。 The example the site gives is 该网站提供的示例是

#include <iostream>
#include <string>
#include <regex>

int main ()
{
  std::string s ("this subject has a submarine as a subsequence");
  std::regex e ("\\b(sub)([^ ]*)");   // matches words beginning by "sub"

  std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
  std::regex_iterator<std::string::iterator> rend;

  while (rit!=rend) {
    std::cout << rit->str() << std::endl;
    ++rit;
  }

  return 0;
}

( http://www.cplusplus.com/reference/regex/regex_iterator/regex_iterator/ ) http://www.cplusplus.com/reference/regex/regex_iterator/regex_iterator/

and I have one question about that: If rend is never initialized, then how is it being used meaningfully in the rit!=rend ? 我对此有一个问题:如果rend从未初始化,那么它如何在rit!=rend被有意义地使用?

Also, is the tool I should be using for getting attributes out of HTML tags? 另外,我应该使用该工具从HTML标签中获取属性吗? What I want to do is take a string like "class='class1 class2' id = 'myId' onclick ='myFunction()' >" and break in into pairs 我想做的是采用一个字符串,例如"class='class1 class2' id = 'myId' onclick ='myFunction()' >"然后分成对

( "class" , "class1 class2" ), ( "id" , "myId" ), ( "onclick" , "myFunction()" ) "class""class1 class2" ),( "id""myId" ),( "onclick""myFunction()"

and then work with them from there. 然后从那里与他们合作。 The regular expression I'm planning to use is 我打算使用的正则表达式是

([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2

and so I plan to iterate through expression of that type while keeping track of whether I'm still in the tag (ie whether I've passed a '>' character). 因此,我计划遍历该类型的表达式,同时跟踪我是否仍在标记中(即是否传递了'>'字符)。 Is it going to be too hard to do this? 这样做太难了吗?

Thank you for any guidance you can offer me. 感谢您提供的指导。

What do you mean with "if rend is never initialized"? 您对“如果未初始化rend是什么意思? Clearly, std::regex_iterator<I> has a default constructor. 显然, std::regex_iterator<I>具有默认的构造函数。 Since the iteration is only forward iteration the end iterator just needs to be something suitable to detect that the end is used. 由于迭代仅是正向迭代,因此最终迭代器只需要是某种适合于检测使用了结束的迭代器即可。 The default constructor can set up rend correspondingly. 默认构造函数可以相应地设置rend

This is an idiom used in a few other places in the standard C++ library, eg, for std::istream_iterator<T> . 这是标准C ++库中其他几个地方使用的惯用法,例如,用于std::istream_iterator<T> Ideally, the end iterator could be indicated using a different type (see, eg, Eric Niebler's discussion on this issue, the link is to the first of four pages) but the standard currently requires that the two types match when using algorithms. 理想情况下,可以使用其他类型来指示最终迭代器(例如,参见Eric Niebler对此问题的讨论 ,链接指向四页的第一页),但是该标准目前要求使用算法时,这两种类型必须匹配。

With respect to parsing HTML using regular expression please refer to this answer . 关于使用正则表达式解析HTML,请参考此答案

rend is not uninitialized, it is default-constructed. rend不是未初始化的,它是默认构造的。 The page you linked is clear that: 您链接的页面很清楚:

The default constructor (1) constructs an end-of-sequence iterator. 默认构造函数(1)构造一个序列结束迭代器。

Since default-construction appears to be the only way to obtain an end-of-sequence iterator, comparing rit to rend is the correct way to test whether rit is exhausted. 由于默认构造似乎是获得序列结束迭代器的唯一方法,因此比较ritrend是测试rit是否耗尽的正确方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM