简体   繁体   English

C字符串是否有标准的C ++迭代器?

[英]Is there a standard C++ iterator for C strings?

Sometimes I need to pass a C string to a function using the common C++ iterator range interface [first, last) . 有时我需要使用通用的C ++迭代器范围接口[first, last)将C字符串传递给函数。 Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string or call strlen() ? 是否有针对这些情况的标准C ++迭代器类,或者是一种标准方法,无需复制字符串或调用strlen()

EDIT: I know I can use a pointer as an iterator, but I would have to know where the string ends, what would require me to call strlen() . 编辑:我知道我可以使用指针作为迭代器,但我必须知道字符串结束的位置,需要我调用strlen()

EDIT2: While I didn't know if such iterator is standardized, I certainly know it is possible. 编辑2:虽然我不知道这样的迭代器是否标准化,但我当然知道它是可能的。 Responding to the sarcastic answers and comments, this is the stub (incomplete, untested): 响应讽刺的答案和评论,这是存根(不完整,未经测试):

class CStringIterator
{
public:
    CStringIterator(char *str=nullptr):
        ptr(str)
    {}

    bool operator==(const CStringIterator& other) const
    {
        if(other.ptr) {
            return ptr == other.ptr;
        } else {
            return !*ptr;
        }
    }

    /* ... operator++ and other iterator stuff */

private:
    char *ptr;
};

EDIT3: Specifically, I am interested in a forward iterator , because I want to avoid to iterate over the sring twice, when I know the algorithm will only have to do it once. EDIT3:具体来说,我对前向迭代器感兴趣,因为我想避免迭代两次sring,当我知道算法只需要执行一次。

There isn't any explicit iterator class , but regular raw pointers are valid iterators as well. 没有任何显式迭代器 ,但常规原始指针也是有效的迭代器。 Problem with C-strings, though, is that they do not come with a native end iterator, which makes them unusable in range based for loops – directly at least... 但是,C字符串的问题在于它们没有自带的结束迭代器,这使它们在基于循环的范围内无法使用 - 至少直接...

You might like to try the following template, though: 不过,您可能想尝试以下模板:

template <typename T>
class Range
{
    T* b;
public:
    class Sentinel
    {
        friend class Range;
        Sentinel() { }
        friend bool operator!=(T* t, Sentinel) { return *t; }

    public:
        Sentinel(Sentinel const& o) { }

    };
    Range(T* begin)
            : b(begin)
    { }
    T* begin() { return b; }
    Sentinel end() { return Sentinel(); }
};

Usage: 用法:

for(auto c : Range<char const>("hello world"))
{
    std::cout << c << std::endl;
}

It originally was designed to iterate over null-terminated argv of main, but works with any pointer to null terminated array – which a C-string is as well... 它最初被设计为迭代以null为终结的main的argv,但是可以使用任何指向null终止数组的指针 - 这也是一个C字符串...

Secret is comparing against the sentinel, which actually does a totally different comparison (current pointer pointing the terminating null (pointer))... 秘密正在与哨兵进行比较,后者实际上做了一个完全不同的比较(当前指针指向终止空(指针))......

Edit: Pre-C++17 variant: 编辑:Pre-C ++ 17变体:

template <typename T>
class Range
{
    T* b;
public:
    class Wrapper
    {
        friend class Range;
        T* t;
        Wrapper(T* t) : t(t) { }
    public:
        Wrapper(Wrapper const& o) : t(o.t) { }
        Wrapper operator++() { ++t; return *this; }
        bool operator!=(Wrapper const& o) const { return *t; }
        T operator*() { return *t; }
    };
    Range(T* begin)
            : b(begin)
    { }
    Wrapper begin() { return Wrapper(b); }
    Wrapper end() { return Wrapper(nullptr); }
};

Actually, yes - sort of. 实际上,是的 - 有点。 In c++17. 在c ++ 17。

C++17 introduces std::string_view which can be constructed from a c-style string. C ++ 17引入了std::string_view ,它可以用c风格的字符串构造。

std::string_view is a random access (proxy) container which of course fully supports iterators. std::string_view是一个随机访问(代理)容器,当然完全支持迭代器。

Note that although constructing a string_view from a const char* will theoretically call std::strlen , the compiler is allowed to (and gcc certainly does) elide the call when it knows the length of the string at compile time. 请注意,虽然从const char*构造string_view理论上会调用std::strlen ,但是当编译时知道字符串的长度时,允许编译器(并且gcc肯定会)忽略调用。

Example: 例:

#include <string_view>
#include <iostream>

template<class Pointer>
struct pointer_span
{
    using iterator = Pointer;

    pointer_span(iterator first, std::size_t size)
    : begin_(first)
    , end_(first + size)
    {
    }

    iterator begin() const { return begin_; }
    iterator end() const { return end_; }

    iterator begin_, end_;
};

int main(int argc, char** argv)
{
    for(auto&& ztr : pointer_span(argv, argc))
    {
        const char* sep = "";
        for (auto ch : std::string_view(ztr))
        {
            std::cout << sep << ch;
            sep = " ";
        }
        std::cout << std::endl;
    }
}

See the example output here 请参阅此处的示例输出

Is there a standard C++ iterator for C strings? C字符串是否有标准的C ++迭代器?

Yes. 是。 A pointer is an iterator for an array. 指针是数组的迭代器。 C strings are (null terminated) arrays of char . C字符串是(null终止的) char数组。 Therefore char* is an iterator for a C string. 因此char*是C字符串的迭代器。

... using the common C++ iterator range interface [first, last) ...使用通用的C ++迭代器范围接口[first, last)

Just like with all other iterators, to have a range, you need to have an end iterator. 就像所有其他迭代器一样,要有一个范围,你需要有一个结束迭代器。

If you know or can assume that an array fully contains the string and nothing more, then you can get the iterator range in constant time using std::begin(arr) ( std::begin is redundant for C arrays which decay to the pointer anyway, but nice for symmetry) and std::end(arr) - 1 . 如果你知道或者可以假设一个数组完全包含字符串而没有更多,那么你可以使用std::begin(arr)在恒定时间内得到迭代器范围(对于C数组, std::begin是多余的,它会衰减到指针无论如何,但对于对称性很好)和std::end(arr) - 1 Otherwise you can use pointer arithmetic with offsets within the array. 否则,您可以将指针算法与数组中的偏移量一起使用。

A little bit of care must be taken to account for the null terminator. 必须小心谨慎来考虑null终止符。 One must remember that the full range of the array contains the null terminator of the string. 必须记住,数组的整个范围包含字符串的空终止符。 If you want the iterator range to represent the string without the terminator, then subtract one from the end iterator of the array, which explains the subtraction in the previous paragraph. 如果希望迭代器范围表示没有终止符的字符串,则从数组的结束迭代器中减去1,这解释了前一段中的减法。

If you don't have an array, but only a pointer - the begin iterator - you can get the end iterator by advancing the beginning by the length of the string. 如果你没有一个数组,但只有一个指针 - 开始迭代器 - 你可以通过将字符串的长度提前一个来获得结束迭代器。 This advancement is a constant operation, because pointers are random access iterators. 这个进步是一个常量操作,因为指针是随机访问迭代器。 If you don't know the length, you can call std::strlen to find out (which isn't a constant operation). 如果你不知道长度,你可以调用std::strlen来查找(这不是一个常量操作)。


Example, std::sort accepts a range of iterators. 例如, std::sort接受一系列迭代器。 You can sort a C string like this: 您可以像这样对C字符串进行排序:

char str[] = "Hello World!";
std::sort(std::begin(str), std::end(str) - 1);
for(char c : "test"); // range-for-loops work as well, but this includes NUL

In the case you don't know the length of the string: 如果您不知道字符串的长度:

char *str = get_me_some_string();
std::sort(str, str + std::strlen(str));

Specifically, I am interested in a forward iterator 具体来说,我对前向迭代器很感兴趣

A pointer is a random access iterator. 指针是随机访问迭代器。 All random access iterators are also forward iterators. 所有随机访问迭代器也是前向迭代器。 A pointer meets all of the requirements listed in the linked iterator concept. 指针符合链接迭代器概念中列出的所有要求。

It is possible to write such iterator, something like this should work: 可以编写这样的迭代器,这样的东西应该工作:

struct csforward_iterator : 
    std::iterator<std::bidirectional_iterator_tag, const char, void> {

    csforward_iterator( pointer ptr = nullptr ) : p( ptr ) {}

    csforward_iterator& operator++()  { ++p; return *this; }
    csforward_iterator operator++(int) { auto t = *this; ++p; return t; }

    csforward_iterator& operator--()  { --p; return *this; }
    csforward_iterator operator--(int) { auto t = *this; --p; return t; }

    bool operator==( csforward_iterator o ) { 
        return p == o.p or ( p ? not ( o.p or *p ) : not *o.p ); 
    }
    bool operator!=( csforward_iterator o ) { return not operator==( o ); }

    void swap( csforward_iterator &o ) { std::swap( p, o.p ); }

    reference operator*() const { return *p; }
    pointer operator->() const { return p; }
private:
    pointer p;
};

live example 实例

though unfortunately standard one is not provided and it probably would be template over char type (like std::string ). 虽然遗憾的是没有提供标准的,它可能是char类型的模板(如std::string )。

我不敢,最后你需要一个指向字符串末尾的指针,你需要调用strlen

If you have a string literal, you can get the end iterator without using std::strlen . 如果你有一个字符串文字,你可以在不使用std::strlen情况下获得结束迭代器。 If you have only a char* , you'll have to write your own iterator class or rely on std::strlen to get the end iterator. 如果只有char* ,则必须编写自己的迭代器类或依赖std::strlen来获取结束迭代器。

Demonstrative code for string literals: 字符串文字的示范代码:

#include <iostream>
#include <utility>

template <typename T, size_t N>
std::pair<T*, T*> array_iterators(T (&a)[N]) { return std::make_pair(&a[0], &a[0]+N); }

int main()
{
   auto iterators = array_iterators("This is a string.");

   // The second of the iterators points one character past the terminating
   // null character. To iterate over the characters of the string, we need to 
   // stop at the terminating null character.

   for ( auto it = iterators.first; it != iterators.second-1; ++it )
   {
      std::cout << *it << std::endl;
   }
}

For ultimate safety and flexibility, you end up wrapping the iterator, and it has to carry some state. 为了获得最大的安全性和灵活性,最终需要包装迭代器,并且它必须带有一些状态。

Issues include: 问题包括:

  • random access - which can be addressed in a wrapped pointer by limiting its overloads to block random access, or by making it strlen() on need 随机访问 - 可以通过限制其重载来阻止随机访问,或者通过在需要时使strlen()在包装指针中解决
  • multiple iterators - when comparing with each other, not end 多个迭代器 - 当相互比较时,不会结束
  • decrementing end - which you could again "fix" by limiting the overloads 递减结束 - 您可以通过限制重载来再次“修复”
  • begin() and end() need to be same type - in c++11 and some api calls. begin()和end()需要是相同的类型 - 在c ++ 11和一些api调用中。
  • a non-const iterator could add or remove content 非const迭代器可以添加或删除内容

Note that it is "not the iterator's problem" if it is randomly seeked outside the range of the container, and it can legally seek past a string_view.end(). 请注意,如果它是在容器范围之外随机搜索的,那么它“不是迭代器的问题”,它可以合法地寻找过去的string_view.end()。 It is also fairly standard that such a broken iterator could not then increment to end() any more. 这样一个破碎的迭代器不能再增加到end()也是相当标准的。

The most painful of these conditions is that end can be decremented, or subtracted, and dereferenced (usually you can't, but for string it is a null character). 这些条件中最痛苦的是结束可以递减,减去和解除引用(通常你不能,但对于字符串它是一个空字符)。 This means the end object needs a flag that it is the end, and the address of the start, so that it can find the actual end using strlen() if either of these operations occurs. 这意味着结束对象需要一个标志,它是结束,以及开始的地址,这样如果发生这些操作中的任何一个,它就可以使用strlen()找到实际结束。

Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string 是否有针对这些情况的标准C ++迭代器类,或者不必复制字符串的标准方法

Iterators are a generalization of pointers. 迭代器是指针的泛化。 Specifically, they're designed so that pointers are valid iterators. 具体来说,它们的设计使得指针是有效的迭代器。

Note the pointer specializations of std::iterator_traits . 注意std::iterator_traits指针特化

I know I can use a pointer as an iterator, but I would have to know where the string ends 我知道我可以使用指针作为迭代器,但我必须知道字符串的结束位置

Unless you have some other way to know where the string ends, calling strlen is the best you can do. 除非你有其他方法知道字符串结束的位置,否则调用strlen是你能做的最好的事情。 If there were a magic iterator wrapper, it would also have to call strlen . 如果有一个神奇的迭代器包装器,它也必须调用strlen

Sorry, an iterator is something that is normally obtained from an iterable instance. 抱歉,迭代器通常是从可迭代实例中获取的。 As char * is a basic type and not a class anymore. 因为char *是基本类型而不是类。 How do you think something like .begin() or .end() , can be achieved. 您如何看待.begin().end()类的东西。

By the way, if you need to iterate a char *p knowing it is nul terminated. 顺便说一句,如果你需要迭代一个char *p知道它已经终止了。 you just can do the following. 你可以做到以下几点。

for( char *p = your_string; *p; ++p ) {
    ...
}

but the thing is that you cannot use iterators as they are defined in C++, because char * is a basic type, has no constructor, has no destructor or methods associated. 但问题是你不能使用它们在C ++中定义的迭代器,因为char *是一个基本类型,没有构造函数,没有析构函数或方法关联。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM