简体   繁体   中英

utf8::next() of utfcpp - tries to iterate past the end of the string

I'm using UTFCPP to work with UTF-8 encoded strings stored in std::string objetcs.

I want to iterate over the code points. utf8::next()

uint32_t next(octet_iterator& it, octet_iterator end);

seems the way to do this. Here's a test program to illustrate the use:

std::string u8("Hello UTF-8 \u2610\u2193\u2190\u0394 World!\n");
std::cout << u8 << std::endl;
uint32_t cp = 0;
std::string::iterator b = u8.begin();
std::string::iterator e = u8.end();
while (cp = utf8::next(b,e))
    printf("%d, ", cp);

This extracts all the characters fine, however, the program throws a NOT_ENOUGH_ROOM exception, which indicates that " it gets equal to end during the extraction of a code point" just after printing the 10, which is the ASCII newline control character:

Hello UTF-8 ☐↓←Δ World!
72, 101, 108, 108, 111, 32, 85, 84, 70, 45, 56, 32, 9744, 8595, 8592, 916, 32, 87, 111, 114, 108, 100, 33, 10,
terminate called after throwing an instance of 'utf8::not_enough_room'
what():  Not enough space

Obviously, providing the end iterator seems to be insufficient to keep utf8::next from trying to read over the end of the string.

I'm also confused by the utf8::unchecked::next() function, which does not even take an end iterator. How does this know where to stop? Is catching the exception the normal control flow to detect the end of the string?? Obviously I'm missing something.

I think you are responsible for checking whether the iterator is equal to end() before calling next().
This should work without an exception being thrown:

[...]
uint32_t cp = 0;
std::string::iterator b = u8.begin();
std::string::iterator e = u8.end();
while ( b != e ) {
    cp = utf8::next(b,e);
    printf("%d, ", cp);
}

Generally, the use of exceptions for control flow is considered to be an anti-pattern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM