简体   繁体   English

来自字符串的 C++ 子字符串

[英]C++ substring from string

I'm pretty new to C++ and I'm need to create MyString class, and its method to create new MyString object from another's substring, but chosen substring changes while class is being created and when I print it with my method.我对 C++ 很陌生,我需要创建 MyString 类,以及从另一个子字符串创建新 MyString 对象的方法,但是在创建类时以及使用我的方法打印它时,选择的子字符串会发生变化。

Here is my code:这是我的代码:

#include <iostream>
#include <cstring>

using namespace std;

class MyString {
public:
    char* str;

    MyString(char* str2create){
        str = str2create;
    }

    MyString Substr(int index2start, int length) {
        char substr[length];
        int i = 0;
        while(i < length) {
            substr[i] = str[index2start + i];
            i++;
        }
        cout<<substr<<endl; // prints normal string
        return MyString(substr);
    }

    void Print() {
        cout<<str<<endl;
    }
};

int main() {
    char str[] = {"hi, I'm a string"};
    MyString myStr = MyString(str);
    myStr.Print();

    MyString myStr1 = myStr.Substr(10, 7);
    cout<<myStr1.str<<endl;
    cout<<"here is the substring I've done:"<<endl;
    myStr1.Print();

    return 0;
}

And here is the output:这是输出:

hi, I'm a string嗨,我是一个字符串

string细绳

stri斯特里

here is the substring I've done:这是我完成的子字符串:

♦

Your function Substr returns the address of a local variable substr indirectly by storing a pointer to it in the return value MyString object.您的函数Substr通过在返回值MyString对象中存储指向它的指针来间接返回局部变量substr的地址。 It's invalid to dereference a pointer to a local variable once it has gone out of scope.一旦超出范围,取消引用指向局部变量的指针是无效的。

I suggest you decide whether your class wraps an external string, or owns its own string data, in which case you will need to copy the input string to a member buffer.我建议您决定您的类是包装外部字符串,还是拥有自己的字符串数据,在这种情况下,您需要将输入字符串复制到成员缓冲区。

Have to walk this through to explain what's going wrong properly so bear with me.必须通过这个来解释正确的地方出了什么问题,所以请耐心等待。

int main() {
    char str[] = {"hi, I'm a string"};

Allocated a temporary array of 17 characters (16 letters plus a the terminating null), placed the characters "hi, I'm a string" in it, and ended it off with a null.分配一个 17 个字符的临时数组(16 个字母加上一个终止空值),在其中放置字符“嗨,我是一个字符串”,并以空值结束。 Temporary means what it sound like.临时意味着它听起来像什么。 When the function ends, str is gone.当函数结束时, str消失了。 Anything pointing at str is now pointing at garbage.任何指向str东西现在都指向垃圾。 It may shamble on for a while and give some semblance of life before it is reused and overwritten, but really it's a zombie and can only be trusted to kill your program and eat its brains.在被重用和覆盖之前,它可能会蹒跚而行,并提供一些表面上的生命,但实际上它是一个僵尸,只能信任杀死您的程序并吃掉它的大脑。

    MyString myStr = MyString(str);

Creates myStr, another temporary variable.创建另一个临时变量 myStr。 Called the constructor with the array of characters.使用字符数组调用构造函数。 So let's take a look at the constructor:那么让我们来看看构造函数:

MyString(char* str2create){
    str = str2create;
}

Take a pointer to a character, in this case it will have a pointer to the first element of main's str .取一个指向字符的指针,在这种情况下,它将有一个指向 main 的str的第一个元素的指针。 This pointer will be assigned to MyString's str .该指针将分配给 MyString 的str There is no copying of the "hi, I'm a string".没有复制“嗨,我是一个字符串”。 Both mains's str and MyString's str point to the same place in memory. mains 的str和 MyString 的str指向内存中的同一位置。 This is a dangerous condition because alterations to one will affect the other.这是一种危险的情况,因为对一个的改变会影响另一个。 If one str goes away, so goes the other.如果一个str消失,另一个str消失。 If one str is overwritten, so too is the other.如果一个str被覆盖,另一个str也会被覆盖。

What the constructor should do is:构造函数应该做的是:

MyString(char* str2create){
    size_t len = strlen(str2create); // 
    str = new char[len+1]; // create appropriately sized buffer to hold string
                           // +1 to hold the null
    strcpy(str, str2create); // copy source string to MyString
}

A few caveats: This is really really easy to break.一些警告:这真的很容易破解。 Pass in a str2create that never ends, for example, and the strlen will go spinning off into unassigned memory and the results will be unpredictable.例如,传入一个永远不会结束的 str2create,并且 strlen 将旋转到未分配的内存中,结果将是不可预测的。

For now we'll assume no one is being particularly malicious and will only enter good values, but this has been shown to be really bad assumption in the real world.现在我们假设没有人特别恶意并且只会输入好的值,但这在现实世界中已被证明是非常糟糕的假设。

This also forces a requirement for a destructor to release the memory used by str这也强制要求析构函数释放str使用的内存

virtual ~MyString(){
    delete[] str;
}

It also adds a requirement for copy and move constructors and copy and move assignment operators to avoid violating the Rule of Three/Five .它还增加了对复制和移动构造函数以及复制和移动赋值运算符的要求,以避免违反三/五规则

Back to OP's Code...回到 OP 的代码...

str and myStr point at the same place in memory, but this isn't bad yet. strmyStr指向内存中的同一个位置,但这还不错。 Because this program is a trivial one, it never becomes a problem.因为这个程序是微不足道的,所以它永远不会成为问题。 myStr and str both expire at the same point and neither are modified again. myStrstr都在同一点到期,并且都不会再次修改。

myStr.Print();

Will print correctly because nothing has changed in str or myStr .将正确打印,因为strmyStr没有任何变化。

    MyString myStr1 = myStr.Substr(10, 7);

Requires us to look at MyString::Substr to see what happens.需要我们查看 MyString::Substr 来看看会发生什么。

MyString Substr(int index2start, int length) {
    char substr[length];

Creates a temporary character array of size length.创建一个长度为长度的临时字符数组。 First off, this is non-standard C++.首先,这是非标准的 C++。 It won't compile under a lot of compilers, do just don't do this in the first place.它不会在很多编译器下编译,首先不要这样做。 Second, it's temporary.第二,是暂时的。 When the function ends, this value is garbage.当函数结束时,这个值是垃圾。 Don't take any pointers to substr because it won't be around long enough to use them.不要使用任何指向substr指针,因为它的存在时间不足以使用它们。 Third, no space was reserved for the terminating null.第三,没有为终止空值保留空间。 This string will be a buffer overrun waiting to happen.这个字符串将是一个等待发生的缓冲区溢出。

    int i = 0;
    while(i < length) {
        substr[i] = str[index2start + i];
        i++;
    }

OK that's pretty good.好的,这很好。 Copy from source to destination.从源复制到目标。 What it is missing is the null termination so users of the char array knows when it ends.它缺少的是空终止,因此 char 数组的用户知道它何时结束。

    cout<<substr<<endl; // prints normal string

And that buffer overrun waiting to happen?那个缓冲区溢出等待发生? Just happened.刚发生。 Whups.哎呀。 You got unlucky because it looks like it worked rather than crashing and letting you know that it didn't.你很不走运,因为它看起来像它工作而不是崩溃并让你知道它没有。 Must have been a null in memory at exactly the right place.在内存中的正确位置必须是空值。

    return MyString(substr);

And this created a new MyString that points to substr .这创建了一个新的 MyString 指向substr Right before substr hit the end of the function and died.就在substr到达函数末尾并死亡之前。 This new MyString points to garbage almost instantly.这个新的 MyString 几乎立即指向垃圾。

}

What Substr should do: Substr 应该做什么:

MyString Substr(int index2start, int length)
{
    std::unique_ptr<char[]> substr(new char[length + 1]);
    // unique_ptr is probably paranoid overkill, but if something does go 
    // wrong, the array's destruction is virtually guaranteed
    int i = 0;
    while (i < length)
    {
        substr[i] = str[index2start + i];
        i++;
    }
    substr[length] = '\0';// null terminate
    cout<<substr.get()<<endl; // get() gets the array out of the unique_ptr
    return MyString(substr.get()); // google "copy elision" for more information 
                                   // on this line.
}

Back in OP's code, with the return to the main function that which was substr starts to be reused and overwritten.回到 OP 的代码中,随着返回到主函数, substr开始被重用和覆盖。

cout<<myStr1.str<<endl;

Prints myStr1.str and already we can see some of it has been reused and destroyed.打印myStr1.str并且我们已经可以看到其中一些已被重用和销毁。

cout<<"here is the substring I've done:"<<endl;
myStr1.Print();

More death, more destruction, less string.更多的死亡,更多的破坏,更少的绳索。

Things to not do in the future:以后不要做的事情:

Sharing pointers where data should have been copied.共享数据应该被复制的指针。

Pointers to temporary data.指向临时数据的指针。

Not null terminating strings.非空终止字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM