简体   繁体   中英

Arrays in different languages - store references, or raw objects?

I am trying to wrap my head around what the raw memory looks like in different languages when using an array.

Consider the following Java code:

String a = "hi";
String b = "there";
String c = "everyone";
String[] array = {a, b, c};

Obviously the array is holding references , and not objects; that is, there is a contiguous array in memory of three references which each points to some other location in memory where the object sits. So the objects themselves aren't necessarily sitting in three contiguous buckets; rather the references are.

Now consider this:

String[] array = {"hi", "there", "everyone"}

I'd imagine in this situation the Strings exist somewhere with all the other constants in memory, and then the array holds references to those constants in memory? So, again, in raw memory the array doesn't look like ['h', 'i', '\\0', 't', 'h', 'e', 'r', 'e'... (etc)] . (using c-style termination just for convenience). Rather, it's more like ['a83a3edf' ,'a38decd' ... (etc)] where each element is a memory location (reference).

My conclusion from this thought process is that in Java, you can never ever imagine arrays as buckets of contiguous objects in memory, but rather as contiguous references. I can't think of any way to guarantee objects will always be stored contiguously in Java.

Now consider C:

char *a = "hi";
char *b = "there";
char *c = "everyone";
char *array[] = {a, b, c};

The code above is functionally equivalent to the Java above -- that is, the array holds references (pointers) to some other memory location. Like Java, the objects being pointed to aren't necessarily contiguous.

HOWEVER, in the following C code:

struct my_struct array[5];  // allocates 5 * size(my_struct) in memory! NOT room for 5
                            // references/pointers, but room for 5 my_structs.

The structs in array ARE contiguously located in raw memory.

Now for my concrete questions:

  1. Was I correct in my assumption that in Java, arrays must ALWAYS hold references, as the programmer only ever has access to references in Java? What about for raw data types? Will it work differently then? Will an array of int s in Java look just like one in C in raw memory (besides the Object class cruft Java will add)?

  2. In Java, is there no way for the programmer to guarantee contiguous memory allocation of objects? It might happen by chance, or with high probability, but the programmer can not GUARANTEE it will be so?

  3. In C, programmers CAN create raw arrays of objects (structs) contiguously in memory, as I have shown above, correct?

  4. How do other languages deal with this? I'm guessing Python works like Java?

The motivation for this question is that I want a solid understanding of what is happening at the raw memory level with arrays in these languages. Mostly for programmer-interview questions. I said in a previous interview that an array (not in any language, just in general) holds objects contiguously in memory like buckets. It was only after I said this that I realized that's not quite how it works in a language like Java. So I want to be 100% clear on it.

Thanks. Let me know if anything needs clarification.

you can never ever imagine arrays as buckets of contiguous objects in memory, but rather as contiguous references.

In theory you are right, in practice, the JVM doesn't randomise memory access. It allocates memory sequentially and it copies objects during a GC in order of discovery (or reverse order)

Was I correct in my assumption that in Java, arrays must ALWAYS hold references, as the programmer only ever has access to references in Java?

Yes, Unless you have an array of primitives of course.

What about for raw data types? Will it work differently then?

Primitives and References are continuous in memory. They are basically the same.

Will an array of ints in Java look just like one in C in raw memory (besides the Object class cruft Java will add)?

yes.

In Java, is there no way for the programmer to guarantee contiguous memory allocation of objects?

Not unless you use off heap memory. Though generally this isn't as much of a problem as you might think as most of the time , the objects will be continuous in memory.

It might happen by chance, or with high probability, but the programmer can not GUARANTEE it will be so?

correct. Usually you have bigger problems when you look at the worst 0.1% latencies or above.

In C, programmers CAN create raw arrays of objects (structs) contiguously in memory, as I have shown above, correct?

yes. You can do it in Java as well, but you have to use off heap memory. There is a number of libraries which support this such as Javolution, Chronicle, SBE.

Low-level languages like C make you deal with memory layout, and whether you have a pointer to somewhere else or a value right here. Make sure you handle stack vs heap allocation correctly and don't forget to free() every pointer you malloc() .

Higher level languages like Java, Python, and JavaScript take away that low-level layout of memory. All objects are on the heap and you have a reference to it. While the reference is similar to a pointer, it is opaque and not directly associated with a given memory location. As such, all data structures contain references to objects.

to 1) In java arrays are Objects and objects and arrays are stored on the heap, since the heap might not be continuous, so arrays also might not be continuous.

4) In python you can create a contiguous array, if you use scipy

I can't speak in any detail to Java, although my understanding is that given the following code

int arr[] = new int[N];

the local (stack) variable arr contains a reference to an array object on the heap, giving us a layout something like this:

          +---+
     arr: |   |---+
          +---+   |
           ...    |
          +---+   |
      cp: |   |<--+  class pointer 
          +---+ 
     flg: |   |      flags
          +---+
     lck: |   |      locks
          +---+
      sz: |   |      size
          +---+
  arr[0]: |   |
          +---+
  arr[1]: |   |
          +---+
           ...
          +---+
arr[N-1]: |   |
          +---+

For an array of primitive types, the values are stored directly in arr[0] , arr[1] , etc. For an array of class types, each element of the array stores a reference to an instance of that class, so there's another level of indirection. The references themselves are stored contiguously, but the instances that they point to are not (or at least, aren't guaranteed to be).

C and C++ arrays are a lot less complicated. Given the following code:

 int arr[N];

you get the following:

          +---+
  arr[0]: |   |
          +---+ 
  arr[1]: |   |
          +---+ 
           ...
          +---+
arr[N-1]: |   |
          +---+

There's no indirection or metadata involved with a C array. There's no storage set aside for an object arr to point to the first element of the array. If the array has auto extent (meaning it was declared within a block and not static ), then the memory for the array elements is allocated the same as for any local variable.

For any type T , T arr[N] will set aside N contiguous elements to store values of type T . If T is an obnoxious struct type, then T a[N] stores N contiguous instances of that obnoxious struct type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM