简体   繁体   English

C ++拆分字符串反馈

[英]C++ splitting a String feedback

I am new to C++ and I still have trouble understanding when I should use pointers, references, std::move. 我是C ++的新手,我仍然在理解何时应该使用指针,引用,std :: move时遇到困难。 I have programmed a short function to split strings using a delimiter. 我已经编写了一个短函数来使用定界符分割字符串。

std::vector<std::string> mylib::split(std::string string, char delimiter) {
    std::vector<std::string> result = std::vector<std::string>();

    std::string cache = std::string();
    cache.reserve(string.size());

    for (char c : string) {
        if (c == delimiter) {
            result.push_back(std::string(cache));
            cache.clear();
        } else {
            cache += c;
        }
    }
    cache.shrink_to_fit();
    result.push_back(cache);
    return result;
}

I have a few questions to this function: Should I use 我对此功能有一些疑问:我应该使用

std::vector<std::string> mylib::split(std::string string, char delimiter) {

or 要么

std::vector<std::string> mylib::split(std::string &string, char delimiter) {

and should it be 而且应该是

result.push_back(std::string(cache));

or 要么

result.push_back(std::move(std::string(cache)));

And do I have to care about the destruction of any of the used objects or could I use this function just like that? 我是否必须关心任何使用过的对象的破坏,还是可以像这样使用此功能? Also, if there are any other ways to improve this method I would be happy to hear your ideas. 另外,如果还有其他方法可以改进此方法,我很高兴听到您的想法。

The best would be 最好的是

std::vector<std::string> mylib::split(const std::string &string, char delimiter) {

as you don't copy more than needed and you guarantee to your caller you won't modify their string. 因为您复制的内容不多于需要,并且可以保证不会给呼叫者带来任何影响,所以您不会修改他们的字符串。 And it makes the API way clearer on the intent. 而且它使API的意图更加清晰。

result.push_back(std::move(std::string(cache))); result.push_back(std :: move(std :: string(cache)));

IMO (and not everybody will agree), you should not worry just yet about std::move'ing the string. IMO(并非所有人都会同意),您现在不必担心std :: move'ing the string。 Yes, you could, because cache is not used in either case (or cleared anyway). 是的,您可以这样做,因为在两种情况下都不会使用缓存(无论如何也不会清除缓存)。 You should only start caring when the performance becomes an issue. 仅当性能成为问题时才应开始护理。 And since you are copying char one by one, I doubt the highest performance improvement will come from move semantics. 而且由于您要一一复制char,所以我怀疑最高的性能改进将来自移动语义。

Dropping the initializers and going with token copy as discussed: 删除初始化程序并使用令牌副本进行讨论:

std::vector<std::string> split(const std::string& string, char delimiter) 
{
    std::vector<std::string> result;
    size_t pos = 0;

    for (size_t scan = 0; scan < string.size(); ++scan) 
    {
        if (string[scan] == delimiter) 
        {
             result.push_back(string.substr(pos, scan - pos));
             pos = scan + 1;
        }
    }
    result.push_back(string.substr(pos, string.size() - pos));
    return result;
}

The rule of thumb is: 经验法则是:

  1. Use & when you want to signal that your function can modify the argument and changes should be visible outside. 当您要表示函数可以修改参数并且更改应该在外部可见时,请使用& Also passing an argument via & does not create a copy. 通过&传递参数也不会创建副本。

  2. Use const when you want to indicate that the function is not going to modify the object. 当您想指示该函数将不修改该对象时,请使用const Although it will copy it. 虽然会复制它。

  3. Use const & to combine both those situations above: the object will not be modified by the function but also will not be copied (which is important when copying is expensive like in the case of strings) 使用const &结合以上两种情况:该对象不会被该函数修改,也不会被复制(当复制非常昂贵时(例如字符串),这很重要)

So for you the best solution is: use const std::string& value (please change the name of the variable). 因此,对您而言,最佳解决方案是:使用const std::string& value (请更改变量的名称)。 You don't modify the string and it may be too big to copy it. 您无需修改​​字符串,它可能太大而无法复制。


As for std::move . 至于std::move What it does is (basically) it turns a non-temporary object to a temporary. 它的作用是(基本上)将非临时对象变为临时对象。 So as you can see using std::move on temporaries (your case) is pointless. 因此,您可以看到在临时对象上使用std::move (您的情况)毫无意义。

Why do we do that? 我们为什么要这样做? In order to allow the C++ compiler to apply aggressive optimizations. 为了允许C ++编译器进行积极的优化。 Consider this code: 考虑以下代码:

std::string text = "abcd";
result.push_back(text);

C++ doesn't know that text is not going to be used anymore. C ++不知道将不再使用text So it has to copy it. 因此它必须复制它。 But with this: 但是有了这个:

std::string text = "abcd";
result.push_back(std::move(text));

you tell the C++ compiler: "hey, I'm not going to use text variable anymore, so you don't have to copy it, just move its internals to the vector". 您告诉C ++编译器:“嘿,我不再使用text变量,因此您不必复制它,只需将其内部移动到向量上即可。” And all you have to know is that in the case of strings copying is more expensive (linear complexity) than moving (always constant time). 您所需要知道的是,在字符串复制的情况下(线性复杂度)比移动(总是恒定时间)要贵。

Warning - an opinion incoming: I find the std::move name really confusing. 警告-收到意见:我发现std::move名称确实令人困惑。 It doesn't actually move anything. 它实际上并没有移动任何东西。 It's just a static cast. 这只是静态转换。 Why not call it std::cast_to_temp or something? 为什么不叫它std::cast_to_temp呢?

Anyway this result.push_back(std::move(std::string(cache))); 无论如何,这个结果result.push_back(std::move(std::string(cache))); is wrong. 是错的。 Pointless. 无意义。 You don't avoid a copy and std::move does nothing. 您不能避免复制,并且std::move不会执行任何操作。 But this result.push_back(std::move(cache)); 但是这个结果result.push_back(std::move(cache)); indeed makes sense. 确实是有道理的。 But careful analysis has to be made: is cache really not needed afterwards? 但是必须进行仔细的分析:以后真的不需要cache吗? It looks like it is (although I didn't dive deeply into your code). 看起来是这样(尽管我没有深入研究您的代码)。


Finally you only care about destruction when you construct, ie for each new you need a delete . 最终,您只在构造时关心销毁,即,对于每个new零件,都需要一个delete You don't have new , you don't need delete *. 您没有new ,也不需要delete *。

* that's not always true, sometimes you deal with a nasty code that does an implicit, invisible new for you but actually forces you to do delete . *并非总是如此,有时您会处理一些令人讨厌的代码,该代码会对您执行隐式,不可见的new ,但实际上会迫使您执行delete Yeah, sometimes it is hard. 是的,有时候很难。 But AFAIK this doesn't happen in the standard (or any other self respecting) library. 但是AFAIK不会在标准(或任何其他自重)库中发生。 This is a very bad practice. 这是非常不好的做法。

Final note: of course this is C++, in reality everything is much more complicated, there are exceptions to each rule and so on, and so on. 最后说明:当然,这是C ++,实际上所有事情都更加复杂,每个规则都有例外等等。 But don't worry about details at the moment, it is ok to learn gradually. 但目前不必担心细节,可以逐步学习。

Pass by value or by reference: 通过值或引用传递:

This will create a copy of string: 这将创建一个字符串副本:

std::vector<std::string> mylib::split(std::string string, char delimiter)

This will pass a reference of string: 这将传递一个字符串引用:

std::vector<std::string> mylib::split(std::string &string, char delimiter) 

In the above cases, you would prefer to pass reference, because you return a std::vector and you only use string to read a part of it to push it to the vector. 在上述情况下,您最好传递引用,因为您返回一个std :: vector,并且仅使用字符串读取字符串的一部分以将其推入向量。 Now because you only read it, it would even be better to make it const: 现在,因为您只阅读了它,所以最好将其设置为const:

std::vector<std::string> mylib::split(const std::string &string, char delimiter) 

Then you are 100% sure that the variable you gave to the split function remains unchanged. 然后,您可以100%确保为拆分函数提供的变量保持不变。 Imagine the following: 想象以下情况:

std::string string = "some,values";

If you pass string to split by value: 如果传递字符串以按值分割:

std::vector<std::string> mylib::split(std::string string, char delimiter) {
    string = "something else";
    ...
}

After calling split, you read the string variable: 调用split之后,您将读取字符串变量:

std::cout << string << std::endl;

This will print "some,values". 这将打印“一些值”。

If you pass by reference however: 但是,如果您通过引用:

std::vector<std::string> mylib::split(std::string &string, char delimiter) {
    string = "something else";
}

It will print "something else", basically your modifying the real string. 它将打印“其他”,基本上是您修改​​实际字符串。

If you make it const, then the compiler will not allow you to overwrite string in the split function. 如果将其设为const,则编译器将不允许您覆盖split函数中的字符串。 So unless your variable needs to be changed in the function, pass a const reference to it. 因此,除非您的变量需要在函数中更改,否则请将const引用传递给它。

Moving or copying: 移动或复制:

This will create a copy of string: 这将创建一个字符串副本:

result.push_back(std::string(cache));

This will move the contents of cache. 这将移动缓存的内容。

result.push_back(std::move(cache));

If you know that creating a copy will usually cost more than moving things around, then you understand that moving will be more efficient, ie faster. 如果您知道创建副本通常比搬走东西要花更多的钱,那么您就会知道搬迁会更有效率,即更快。 But then again, adding move calls for a string sounds like premature optimization. 但是再说一次,为字符串添加移动调用听起来像过早的优化。 Unless you are dealing with a lot of data, I don't see a reason to move a string instead of copying because it makes the code less readable and the performance gain would probably be minimal. 除非您要处理大量数据,否则我看不出要移动字符串而不是复制字符串的原因,因为这会使代码的可读性降低,并且性能提升可能很小。

Pointers vs references 指针与参考

Basically you can think of a pointer like you think of a reference. 基本上,您可以像参考引用一样思考指针。 It's an address to a piece of memory. 这是一段内存的地址。 The syntax is different, pointers can be null while references can't. 语法不同,指针不能为null,而引用不能为null。 Pointers can also be allocated on the heap while references are always allocated on the stack. 指针也可以分配在堆上,而引用总是分配在堆栈上。

std::string string = "some,values";

std::vector<std::string> mylib::split(std::string *string, char delimiter) {
    *string = "something else";
    ...
}

std::cout << *string << std::endl; // will print "something else"
std::cout << string << std::endl; // will print the address of the pointer

Notice the * in split is telling that you pass a pointer, the * before string '*string = "something else"' means that the pointer is dereferenced and that the value is written to the location of the pointer. 注意,拆分中的*表示您传递了一个指针,字符串'* string =“ something else”'之前的*表示该指针已取消引用,并且该值已写入该指针的位置。 Same for the print, we read the value and print it by dereferencing the pointer. 与打印相同,我们读取值并通过解引用指针进行打印。

I hope that clears up some doubts you have. 我希望这可以消除您的一些疑问。

You should read more about C++ pass by reference vs. pass by value. 您应该阅读有关C ++按引用传递与按值传递的更多信息。 But to make it simple, 但为了简单起见,

  • use std::vector<std::string> mylib::split(std::string string, char delimiter) { when you do not want to change the variable itself when you pass it to function. 使用std::vector<std::string> mylib::split(std::string string, char delimiter) {当您不想在将变量传递给函数时更改变量本身时。 This means you pass string object by value and you make a copy inside a function of that string. 这意味着您按值传递字符串对象,然后在该字符串的函数内进行复制
  • std::vector<std::string> mylib::split(std::string &string, char delimiter) { mean you are passing string object by reference. std::vector<std::string> mylib::split(std::string &string, char delimiter) {表示您正在按引用传递字符串对象。 So when you change the string inside a function you will change the string itself independent where have you declared it. 因此,当您在函数中更改字符串时,将在声明字符串的位置独立地更改字符串本身。 Also it is more performance friendly to pas by reference since you do not have to copy the object. 另外,由于不必复制对象,因此按引用对pas的性能更友好。

And do I have to care about the destruction of any of the used objects or could I use this function just like that? 我是否必须关心任何使用过的对象的破坏,还是可以像这样使用此功能?

No, you do do not have to worry about destruction of any objects since you only use STL and not user defined objects. 不,您不必担心销毁任何对象,因为您只使用STL而不是用户定义的对象。 Moreover, it should be like that: result.push_back(std::string(cache)) . 而且,应该像这样: result.push_back(std::string(cache)) Don't use std::move when you push object to the container. 将对象推送到容器时,请勿使用std::move

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM