简体   繁体   中英

Why reference types inside structs behave like value types?

I am a beginner to C# programming. I am now studying strings , structs , value types and reference types . As accepted answers in here and in here , strings are reference types that have pointers stored on stack while their actual contents stored on heap. Also, as claimed in here , structs are value types. Now I try to practice with structs and strings with a small example:

struct Person
{
    public string name;
}

class Program
{
    static void Main(string[] args)
    {
        Person person_1 = new Person();
        person_1.name = "Person 1";

        Person person_2 = person_1;
        person_2.name = "Person 2";

        Console.WriteLine(person_1.name);
        Console.WriteLine(person_2.name);
    }
}

The above code snippet outputs

Person 1
Person 2

that makes me confused. If strings are reference types and structs are value types then person_1.name and person_2.name should point to the same space region on heap, shouldn't them?

strings are reference types that have pointers stored on stack while their actual contents stored on heap

No no no. First off, stop thinking about stack and heap. This is almost always the wrong way to think in C#. C# manages storage lifetime for you.

Second, though references may be implemented as pointers, references are not logically pointers. References are references. C# has both references and pointers. Don't mix them up. There is no pointer to string in C#, ever. There are references to string.

Third, a reference to a string could be stored on the stack but it could also be stored on the heap. When you have an array of references to string, the array contents are on the heap.

Now let's come to your actual question.

    Person person_1 = new Person();
    person_1.name = "Person 1";
    Person person_2 = person_1; // This is the interesting line
    person_2.name = "Person 2";

Let's illustrate what the code does logically. Your Person struct is nothing more than a string reference, so your program is the same as:

string person_1_name = null; // That's what new does on a struct
person_1_name = "Person 1";
string person_2_name = person_1_name; // Now they refer to the same string
person_2_name = "Person 2"; // And now they refer to different strings

When you say person2 = person1 that does not mean that the variable person1 is now an alias for the variable person2. (There is a way to do that in C#, but this is not it.) It means "copy the contents of person1 to person2". The reference to the string is the value that is copied.

If that's not clear try drawing boxes for variables and arrows for references; when the struct is copied, a copy of the arrow is made, not a copy of the box .

The best way to understand this is to fully understand what variables are; variables are, simply put, placeholders that hold values .

So what exactly is this value? In a reference type, the value stored in the variable is the reference (the address so to speak) to a given object. In a value type, the value is the object itself .

When you do AnyType y = x; what really happens is that a copy of the value stored in x is made and is then stored in y .

So if x is a reference type, both x and y will point to the same object because they will both hold identical copies of the same reference. If x is a value type then both x and y will hold two identical but distinct objects.

Once you understand this, it should start to make sense why your code behaves the way it does. Lets study it step by step:

Person person_1 = new Person();

Ok we are creating a new instance of a value type. According to what I explained previously, the value stores in person_1 is the newly created object itself. Where this value is stored (heap or stack) is an implementation detail, its not relevant at all to how your code behaves.

person_1.name = "Person 1";

Now we are setting the variable name which happens to be a field of person_1 . Again according to previous explanations, the value of name is a reference pointing to somewhere in memory where the string "Person 1" is stored. Again, where the value or the string are stored is irrelevant.

Person person_2 = person_1;

Ok, this is the interesting part. What happens here? Well, a copy of the value stored in person_1 is made and stored in person_2 . Because the value happens to be an instance of a value type, a new copy of said instance is created and stored in person_2 . This new copy has its own field name and the value stored in this variable is, again, a copy of the value stored in person_1.name (a reference to "Person 1" ).

person_2.name = "Person 2";

Now we are simply reassigning the variable person_2.name . This means we are storing a new reference that points to a new string somewhere in memory. Do note, that person_2.name originally held a copy of the value stored in person_1.name so whatever you do to person_2.name has no effect on whatever value is stored in person_1.name because you are simply changing... yeah exactly, a copy . And thats why your code behaves the way it does.

As an exercise, try to reason out in a similar way how your code would behave if Person were a reference type.

Each struct instance has it's own fields. person_1.name is an independent variable from person_2.name . These are not static fields.

person_2 = person_1 copies the struct by value.

The fact that string is immutable is not required to explain this behavior.

Here's the same case with a class instead to demonstrate the difference:

class C { public string S; }

C c1 = new C();
C c2 = c1; //copy reference, share object
c1.S = "x"; //it appears that c2.S has been set simultaneously because it's the same object

Here, c1.S and c2.S refer to the same variable. If you make this a struct then they become different variables (as in your code). c2 = c1 then turns in a copy of the struct value where it previously was a copy of an object reference.

Think of strings are arrays of characters. The code below is similar to yours, but with arrays.

public struct Lottery
{
    public int[] numbers;
}

public static void Main()
{
    var A = new Lottery();
    A.numbers = new[] { 1,2,3,4,5 };
    // struct A is in the stack, and it contains one reference to an array in RAM

    var B = A;
    // struct B also is in the stack, and it contains a copy of A.numbers reference
    B.numbers[0] = 10;
    // A.numbers[0] == 10, since both A.numbers and B.numbers point to same memory
    // You can't do this with strings because they are immutable

    B.numbers = new int[] { 6,7,8,9,10 };
    // B.numbers now points to a new location in RAM
    B.numbers[0] = 60;
    // A.numbers[0] == 10, B.numbers[0] == 60        
    // The two structures A and B *are completely separate* now.
}

So if you have a structure that contains references (strings, arrays or classes) and you want to implement ICloneable make sure you also clone the contents of the references.

public class Person : ICloneable
{
    public string Name { get; set; }

    public Person Clone()
    {
        return new Person() { Name=this.Name }; // string copy
    }
    object ICloneable.Clone() { return Clone(); } // interface calls specific function
}
public struct Project : ICloneable
{
    public Person Leader { get; set; }
    public string Name { get; set; }
    public int[] Steps { get; set; }

    public Project Clone()
    {
        return new Project()
        {
            Leader=this.Leader.Clone(),         // calls Clone for copy
            Name=this.Name,                     // string copy
            Steps=this.Steps.Clone() as int[]   // shallow copy of array
        };
    }
    object ICloneable.Clone() { return Clone(); } // interface calls specific function
}

I think a lot of the answers here miss the point of the original question, mainly because the example was not really good. Some answers point to the immutability of strings as the correct cause of this behaviour, but in the question of the op that indeed would not have made a difference.

A better example to illustrate some confusion I have seen in my dev teams over strings would have been:

class SomeClass
{
    public int SomeNumber;
}

struct Person
{
    public string name;
    public SomeClass someClass;
}

class Program
{
    static void Main(string[] args)
    {
        Person person_1 = new Person();
        person_1.someClass = new SomeClass()
        {
            SomeNumber = 4,
        };
        person_1.name = "Person 1";

        Person person_2 = person_1;
        person_2.name += " changed";
        person_2.someClass.SomeNumber += 1;

        Console.WriteLine(person_1.name);
        Console.WriteLine(person_2.name);
        Console.WriteLine(person_1.someClass.SomeNumber);
        Console.WriteLine(person_2.someClass.SomeNumber);
    }
}

In this example the output would be

Person 1
Person 1 changed 
5
5

The question of the op was, if both instances of objects and strings are reference types, then why do they behave differently when copied around. The correct answer in this example would indeed be because strings are immutable.

Person person_2 = person_1; // at this point the properties of person_2 both point to the same memory location as those of person 1. this is because person_1 is copied by value to person_2, the references are the values being copied, not what they point to (no deep copy)

person_2.name += " changed"; // strings are immutable, so the first string is not changed, instead a new memory location is allocated, the characters are stored and a new reference to that location is stored in the second struct

person_2.someClass.SomeNumber += 1; // nothing here changes the reference of someClass, thus both structs reflect this new value

I hope this clears up some confusion for people still wondering about this.

I would highlight the fact, that by person_2.name = "Person 2" we are actually creating a new string object in the memory that contains the value "Person 2", and we are asigning the reference of this object. You can imagine it as following:

class StringClass 
{
   string value; //lets imagine this is a "value type" string, so it's like int

   StringClass(string value)
   { 
      this.value = value
   }
}

By person_2.name = "Person 2" you are actually doing something like person_2.name = new StringClass("Person 2") , whilst "name" holds just a value which represents an address in a memory

Now if I rewrite your code:

struct Person
{
    public StringClass name;
}

class Program
{
    static void Main(string[] args)
    {
        Person person_1 = new Person();
        person_1.name = new String("Person 1"); //imagine the reference value of name is "m1", which points somewhere into the memory where "Person 1" is saved

        Person person_2 = person_1; //person_2.name holds the same reference, that is "m1" that was copied from person_1.name 
        person_2.name = new String("Person 2"); //person_2.name now holds a new reference "m2" to  a new StringClass object in the memory, person_1.name still have the value of "m1"

        person_1.name = person_2.name //this copies back the new reference "m2" to the original struct

        Console.WriteLine(person_1.name);
        Console.WriteLine(person_2.name);
    }
}

Now the output of the snippet:

Person 2
Person 2 

To be able to change person_1.name the way you originally posted in your snippet in a struct you would need to use ref https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM