简体   繁体   English

如何有效地获取 substring 的 `std::string` 的 `string_view`

[英]How to efficiently get a `string_view` for a substring of `std::string`

Using http://en.cppreference.com/w/cpp/string/basic_string_view as a reference, I see no way to do this more elegantly:使用http://en.cppreference.com/w/cpp/string/basic_string_view作为参考,我认为没有办法更优雅地做到这一点:

std::string s = "hello world!";
std::string_view v = s;
v = v.substr(6, 5); // "world"

Worse, the naive approach is a pitfall and leaves v a dangling reference to a temporary:更糟糕的是,天真的方法是一个陷阱,它会v一个对临时对象的悬空引用:

std::string s = "hello world!";
std::string_view v(s.substr(6, 5)); // OOPS!

I seem to remember something like there might be an addition to the standard library to return a substring as a view:似乎记得标准库中可能有一个附加项以返回 substring 作为视图:

auto v(s.substr_view(6, 5));

I can think of the following workarounds:我可以想到以下解决方法:

std::string_view(s).substr(6, 5);
std::string_view(s.data()+6, 5);
// or even "worse":
std::string_view(s).remove_prefix(6).remove_suffix(1);

Frankly, I don't think any of these are very nice.坦率地说,我认为这些都不是很好。 Right now the best thing I can think of is using aliases to simply make things less verbose.现在我能想到的最好的事情就是使用别名来简单地使事情不那么冗长。

using sv = std::string_view;
sv(s).substr(6, 5);

There's the free-function route, but unless you also provide overloads for std::string it's a snake-pit.有自由功能路线,但除非您还为std::string提供重载,否则它是一个蛇坑。

#include <string>
#include <string_view>

std::string_view sub_string(
  std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;

  // this is fine and elegant...
  auto bar = sub_string(source, 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

IMHO the whole design of string_view is a horror show which will take us back to a world of segfaults and angry customers.恕我直言,string_view 的整个设计是一场恐怖秀,它将把我们带回一个充满段错误和愤怒客户的世界。

update:更新:

Even adding overloads for std::string is a horror show.甚至为std::string添加重载也是一场恐怖表演。 See if you can spot the subtle segfault timebomb...看看你是否能发现微妙的段错误定时炸弹......

#include <string>
#include <string_view>

std::string_view sub_string(std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string&& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string const& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;
  auto bar = sub_string(std::string_view(source), 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

The compiler found nothing to warn about here.编译器在这里没有发现任何警告。 I am certain that a code review would not either.我确信代码审查也不会。

I've said it before and I'll say it again, in case anyone on the c++ committee is watching, allowing implicit conversions from std::string to std::string_view is a terrible error which will only serve to bring c++ into disrepute .我之前已经说过,我会再说一遍,以防 c++ 委员会中的任何人正在观看,允许从std::stringstd::string_view隐式转换是一个可怕的错误,只会使 c++ 声名狼藉.

Update更新

Having raised this (to me) rather alarming property of string_view on the cpporg message board, my concerns have been met with indifference.在 cpporg 留言板上提出了这个(对我来说)string_view 相当令人震惊的属性后,我的担忧得到了冷漠。

The consensus of advice from this group is that std::string_view must never be returned from a function, which means that my first offering above is bad form.这个小组的共识是std::string_view绝不能从函数返回,这意味着我上面的第一个提供是错误的形式。

There is of course no compiler help to catch times when this happens by accident (for example through template expansion).当然没有编译器帮助捕捉偶然发生的时间(例如通过模板扩展)。

As a result, std::string_view should be used with the utmost care, because from a memory management point of view it is equivalent to a copyable pointer pointing into the state of another object, which may no longer exist.因此,应极其小心地使用std::string_view ,因为从内存管理的角度来看,它等效于指向另一个可能不再存在的对象状态的可复制指针。 However, it looks and behaves in all other respects like a value type.但是,它在所有其他方面的外观和行为都类似于值类型。

Thus code like this:因此代码如下:

auto s = get_something().get_suffix();

Is safe when get_suffix() returns a std::string (either by value or reference)get_suffix()返回std::string时是安全的(通过值或引用)

but is UB if get_suffix() is ever refactored to return a std::string_view .但如果 get_suffix() 被重构以返回std::string_view则是 UB 。

Which in my humble view means that any user code that stores returned strings using auto will break if the libraries they are calling are ever refactored to return std::string_view in place of std::string const& .在我看来,这意味着任何使用auto存储返回字符串的用户代码都会中断,如果他们调用的库被重构为返回std::string_view代替std::string const&

So from now on, at least for me, "almost always auto" will have to become, "almost always auto, except when it's strings".所以从现在开始,至少对我来说,“几乎总是自动”必须变成“几乎总是自动,除非它是字符串”。

You can use the conversion operator from std::string to std::string_view :您可以使用从std::stringstd::string_view的转换运算符:

std::string s = "hello world!";
std::string_view v = std::string_view(s).substr(6, 5);

This is how you can efficiently create a sub-string string_view.这就是您如何有效地创建子字符串 string_view 的方法。

#include <string>
inline std::string_view substr_view(const std::string& source, size_t offset = 0,
                std::string_view::size_type count = 
                std::numeric_limits<std::string_view::size_type>::max()) {
    if (offset < source.size()) 
        return std::string_view(source.data() + offset, 
                        std::min(source.size() - offset, count));
    return {};
}

#include <iostream>
int main(void) {
  std::cout << substr_view("abcd",3,11) << "\n";

  std::string s {"0123456789"};
  std::cout << substr_view(s,3,2) << "\n";

  // be cautious about lifetime, as illustrated at https://en.cppreference.com/w/cpp/string/basic_string_view
  std::string_view bad = substr_view("0123456789"s, 3, 2); // "bad" holds a dangling pointer
  std::cout << bad << "\n"; // possible access violation

  return 0;
}

I realize that the question is about C++17, but it's worth noting that C++20 introduced a string_view constructor that accepts two iterators to char (or whatever the base type is) which allows writing我意识到问题是关于 C++17,但值得注意的是 C++20 引入了一个string_view构造函数,它接受两个迭代器到 char(或任何基本类型),它允许编写

std::string_view v{ s.begin() +6, s.begin()+6 +5 };

Not sure if there is a cleaner syntax, but it's not difficult to不确定是否有更清晰的语法,但并不难

#define RANGE(_container,_start,_length) (_container).begin() + (_start), (_container).begin() + (_start) + (_length)

for a final决赛

std::string_view v{ RANGE(s,6,5) };

PS: I called RANGE 's first parameter _container instead of _string for a reason: the macro can be used with any Container (or class supporting at least begin() and end() ), even as part of a function call like PS:我调用RANGE的第一个参数_container而不是_string是有原因的:该宏可以与任何 Container(或 class 至少支持begin()end() )一起使用,即使作为 function 调用的一部分,如

auto pisPosition= std::find( RANGE(myDoubleVector,11,23), std::numbers::pi );

PPS: When possible, prefer C++20's actual ranges library to this poor person's solution. PPS:如果可能,更喜欢 C++20 的实际范围库而不是这个可怜人的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM