简体   繁体   English

regex_token_iterator和regex_iterator有什么区别?

[英]What is the difference between regex_token_iterator and regex_iterator?

Is there any difference between regex_token_iterator and regex_iterator? regex_token_iterator和regex_iterator之间有什么区别吗?

It seems they both do same work but not sure which one is better performance? 看来他们都做同样的工作,但不确定哪一个是更好的表现?

There is indeed a difference between, if we look at cppreference it describes std::regex_iterator as follows: 确实存在差异,如果我们看一下cppreference它描述std :: regex_iterator如下:

std::regex_iterator is a read-only ForwardIterator that accesses the individual matches of a regular expression within the underlying character sequence. std :: regex_iterator是一个只读的ForwardIterator,用于访问基础字符序列中正则表达式的各个匹配项。

and std::regex_token_iterator as: std :: regex_token_iterator

std::regex_token_iterator is a read-only ForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. std :: regex_token_iterator是一个只读的ForwardIterator,它访问基础字符序列中正则表达式的每个匹配项的各个子匹配项。 It can also be used to access the parts of the sequence that were not matched by the given regular expression (eg as a tokenizer). 它还可以用于访问序列中与给定正则表达式不匹配的部分(例如,作为标记化器)。

So std::regex_token_iterator allows you to also match the non-matched tokens or the n-th sub-expression. 因此std::regex_token_iterator允许您匹配不匹配的标记或第n-th个子表达式。

The cppreference section for std::regex_token_iterator that I linked above describes a typical implementation as follows: 我在上面链接的std::regex_token_iterator的cppreference部分描述了一个典型的实现,如下所示:

A typical implementation of std::regex_token_iterator holds the underlying std::regex_iterator, a container (eg std::vector) of the requested submatch indexes, the internal counter equal to the index of the submatch, a pointer to std::sub_match, pointing at the current submatch of the current match, and a std::match_results object containing the last non-matched character sequence (used in tokenizer mode). std :: regex_token_iterator的典型实现包含底层的std :: regex_iterator,一个请求的子匹配索引的容器(例如std :: vector),内部计数器等于子匹配的索引,指向std :: sub_match的指针,指向当前匹配的当前子匹配,以及包含最后一个不匹配字符序列的std :: match_results对象(在tokenizer模式下使用)。

The book The C++ Standard Library explains in 14.4 Regex Token Iterators as follows: C ++标准库 ”一书在14.4 Regex Token迭代器中解释如下:

A regex iterator helps to iterate over matched subsequences. 正则表达式迭代器有助于迭代匹配的子序列。 However, sometimes you also want to process all the contents between matched expressions. 但是,有时您还希望处理匹配表达式之间的所有内容。 [...] In addition, you can specify a list of integral values, which represent elements of a “tokenization”: [...]此外,您可以指定一个整数值列表,它表示“标记化”的元素:

  • -1 means that you are interested in all the subsequences between matched regular expressions (token separators). -1表示您对匹配的正则表达式(标记分隔符)之间的所有子序列感兴趣。
  • 0 means that you are interested in all the matched regular expressions (token separators). 0表示您对所有匹配的正则表达式(标记分隔符)感兴趣。
  • Any other value n means that you are interested in the matched nth subexpression inside the regular expressions. 任何其他值n意味着您对正则表达式中匹配的第n个子表达式感兴趣。

The books site provides example code for sregex_token_iterator and sregex_iterator which should also be helpful. books网站提供了sregex_token_iteratorsregex_iterator的示例代码,这也应该是有用的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM