简体繁体 English

regex_token_iterator和regex_iterator有什么区别？

[英]What is the difference between regex_token_iterator and regex_iterator?

原文 2014-10-12 01:34:16 6 1 c++/ regex

Is there any difference between regex_token_iterator and regex_iterator? regex_token_iterator和regex_iterator之间有什么区别吗？

It seems they both do same work but not sure which one is better performance? 看来他们都做同样的工作，但不确定哪一个是更好的表现？

1 个解决方案

There is indeed a difference between, if we look at cppreference it describes std::regex_iterator as follows: 确实存在差异，如果我们看一下cppreference它描述std :: regex_iterator如下：

std::regex_iterator is a read-only ForwardIterator that accesses the individual matches of a regular expression within the underlying character sequence. std :: regex_iterator是一个只读的ForwardIterator，用于访问基础字符序列中正则表达式的各个匹配项。

and std::regex_token_iterator as: 和std :: regex_token_iterator ：

std::regex_token_iterator is a read-only ForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. std :: regex_token_iterator是一个只读的ForwardIterator，它访问基础字符序列中正则表达式的每个匹配项的各个子匹配项。 It can also be used to access the parts of the sequence that were not matched by the given regular expression (eg as a tokenizer). 它还可以用于访问序列中与给定正则表达式不匹配的部分（例如，作为标记化器）。

So std::regex_token_iterator allows you to also match the non-matched tokens or the n-th sub-expression. 因此std::regex_token_iterator允许您匹配不匹配的标记或第n-th个子表达式。

The cppreference section for std::regex_token_iterator that I linked above describes a typical implementation as follows: 我在上面链接的std::regex_token_iterator的cppreference部分描述了一个典型的实现，如下所示：

A typical implementation of std::regex_token_iterator holds the underlying std::regex_iterator, a container (eg std::vector) of the requested submatch indexes, the internal counter equal to the index of the submatch, a pointer to std::sub_match, pointing at the current submatch of the current match, and a std::match_results object containing the last non-matched character sequence (used in tokenizer mode). std :: regex_token_iterator的典型实现包含底层的std :: regex_iterator，一个请求的子匹配索引的容器（例如std :: vector），内部计数器等于子匹配的索引，指向std :: sub_match的指针，指向当前匹配的当前子匹配，以及包含最后一个不匹配字符序列的std :: match_results对象（在tokenizer模式下使用）。

The book The C++ Standard Library explains in 14.4 Regex Token Iterators as follows: “ C ++标准库 ”一书在14.4 Regex Token迭代器中解释如下：

A regex iterator helps to iterate over matched subsequences. 正则表达式迭代器有助于迭代匹配的子序列。 However, sometimes you also want to process all the contents between matched expressions. 但是，有时您还希望处理匹配表达式之间的所有内容。 [...] In addition, you can specify a list of integral values, which represent elements of a “tokenization”: [...]此外，您可以指定一个整数值列表，它表示“标记化”的元素：

-1 means that you are interested in all the subsequences between matched regular expressions (token separators). -1表示您对匹配的正则表达式（标记分隔符）之间的所有子序列感兴趣。

0 means that you are interested in all the matched regular expressions (token separators). 0表示您对所有匹配的正则表达式（标记分隔符）感兴趣。

Any other value n means that you are interested in the matched nth subexpression inside the regular expressions. 任何其他值n意味着您对正则表达式中匹配的第n个子表达式感兴趣。

The books site provides example code for sregex_token_iterator and sregex_iterator which should also be helpful. books网站提供了sregex_token_iterator和sregex_iterator的示例代码，这也应该是有用的。