简体   繁体   中英

C Array versus Named Variables

If this is a pointer to the first element of the array:

int foo[5];

Then isn't it more efficient to label each of the individual members:

int foo0, foo1, foo2, foo3, foo4;

The first one looks neater, but is it using indirection because the variable foo is actually a pointer?

Edit: I'm still waiting for a real answer. I'm trying to understand how it works. I'm not asking for advice on best practices. Thank you.

It is a generic design question.

Simple variables and arrays have different usages. If you intend to iterate over a range of variables of the same type , that means that those variables should be an array.

Performance should not be considered here, because those kind of indexations are common and are handled quite efficiently.

If this is a pointer to the first element of the array:

Well it isn't, it is an array. Arrays are not pointers. Pointers are not arrays.

An array, when used in most expressions or when declared as a function parameter, "decays" into a pointer to the first item of the array. Meaning that the array name can often be used as if it was a pointer in many cases. That doesn't make an array a pointer.

If the mail man receives a letter with the address of your house, they know to which house to deliver the letter. That doesn't mean that the text on the letter a house, or that your house is now a text written on a letter.

Then isn't it more efficient...

No, because arrays are guaranteed to be allocated in a contiguous memory area. Individual variables have no such guarantee. Furthermore, the code efficiency of examples like this is exactly the kind of stuff that beginners shouldn't even bother pondering about. Manual code optimization is an advanced topic, it takes lots of experience and system knowledge.

Is a function processing an array of variable size going to be slower than one taking exactly 5 parameters of the same type? Well, maybe, maybe not... it depends on the ABI and calling convention, how parameters are passed, how many CPU registers that can be utilized, what happens to end up in data cache, if available, and so on.

But a function taking a variable size array is more powerful, easier to maintain, easier to read. So that's the correct solution for those reason in the majority of use-cases.

Furthermore, individual variables like in your example are bound to create lots of code repetition, which is harder to maintain and a potential source of bugs.

Rather than:

for(size_t i=0; i<n; i++)
  if(arr[i] == something)
    do_stuff(arr[i]);

You'll end up with:

if(foo0 == something)
  do_stuff(foo0)
if(foo1 == something)
  do_stuff(foo1)
...

That quickly escalates into massive code repetition, especially bad if the code is far more complex than these simple examples. Also, it means that the size of your executable increases.

In this declaration

int foo[5];

you declared an object of the array type int[5] . The object foo is not a pointer to the first element of the array. To see the difference consider the following demonstration program.

#include <stdio.h>

int main(void) 
{
    int foo[5];
    int *foo_ptr = &foo[0];
    
    printf( "sizeof( foo ) = %zu\n", sizeof( foo ) );
    printf( "sizeof( foo_ptr ) = %zu\n", sizeof( foo_ptr ) );

    return 0;
}

The program output is

sizeof( foo ) = 20
sizeof( foo_ptr ) = 8

Or consider another program

#include <stdio.h>

int main(void) 
{
    int foo[5];
    int *foo_ptr = &foo[0];
    
    int ( *p1 )[5] = &foo;
    int **p2 = &foo_ptr;
    
    printf( "sizeof( *p1 ) = %zu\n", sizeof( *p1 ) );
    printf( "sizeof( *p2r ) = %zu\n", sizeof( *p2 ) );

    return 0;
}

Again the program output is

sizeof( *p1 ) = 20
sizeof( *p2r ) = 8

There is even a difference between types of pointers p1 and p2 .

Moreover pointers may be assignable while arrays are not modifiable lvalues.

That is you may not write for example

int foo[5] = { 1, 2, 3, 4, 5 };
int bar[5] = { 5, 4, 3, 2, 1 };

foo = bar;

But you may write

int foo[5] = { 1, 2, 3, 4, 5 };
int bar[5] = { 5, 4, 3, 2, 1 };

int *foo_ptr = foo;
int *bar_ptr = bar;

for_ptr = bar_ptr;

But indeed an array designator with rare exceptions is converted to a pointer to its first element.

For example in this declaration

int *foo_ptr = foo;

the array designator used as an initializer is converted to a pointer to its first element.

From the C Standard (6.3.2.1 Lvalues, arrays, and function designators)

3 Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

To label individual elements as you are suggesting

int foo0, foo1, foo2, foo3, foo4;

does not make a great sense.

For example an array can contain very many elements for example a thousand elements or even more.

Will you write such a declaration like?

int foo0, foo1, foo2, foo3, foo4, /*other variable declarations...*/, foo1000;

Another problem how to pass all the variables to a function?

You will need to list all the variables. And moreover when you pass an array to a function then in fact all elements of the array are passed by reference. That is the function will deal with the original elements of the array through a pointer to its first element. Passing all "labeled elements" to a function means that the function will deal with copies of the elements. You will need to pass each "labeled element" applying to them the address of operator.

Another problem is that the C Standard does not specify in each order the local variable will be placed in memory. For example the compiler may place the variables starting from foo0 or from foo1000.

One more problem is how to write a loop for processing the variables? You may not use a pointer to traverse all the variables because you may use a pointer to travers objects if they are belong to an array.

Or how to allocate dynamically all the variables and access them as an array?

If you will write for example

int *p0 = malloc( sizeof( int ) );
...
int p1000 = malloc( sizeof( int ) );

then there is no any guarantee that the allocated objects will be stored in adjacent extents of memory.

The first one looks neater, but is it using indirection because the variable foo is actually a pointer?

Yes and no.

The C standard describes C using an abstract computer. It describes the behavior of p[2] as if the array p were converted to a pointer to its first element, then 2 were added to produce a pointer to the third element (index 2), and then that element were accessed.

However, the compiler is not required to slavishly reproduce the exact behaviors of that abstract machine. It is allowed to optimize the code it generates. And good compilers will, if you enable any level of optimization. If you declare int foo[5] inside a function and use foo[2] in the function, the compiler will know where foo is, and it will know where foo[2] is, and it will access foo[2] the same way it would access foo2 if you declare int foo0, foo1, foo2, foo3, foo4; . Likely it will be an offset from a stack pointer or stack frame pointer or similar base register. It will not actually do pointer calculations when you use foo[2] in the code.

With foo[2] , the compiler will look up foo in its internal database, do the pointer calculation, and then generate direct code to access foo[2] . With foo2 , the compiler will look up foo2 in its internal database and then generate direct code to access foo2 . As long as you access the array with a constant index, the end result will be the same.

Given this, we can answer your question:

Then isn't it more efficient to label each of the individual members:…

No, because the run-time code is the same. And, at compile time, creating one array named foo and keeping track of that in the internal database may be less work for the compiler than creating five variables and tracking them all.

foo[3] should generate exactly the same code as foo3 , since the compiler knows the address at compile time in either case. Generally, the expression foo[i] takes a bit longer because it must run code to compute the address from the location of foo and the value of i . Use the array when you need to loop through the values, that's what it's for. Use foo0 , foo1 , etc when you don't.

Why the arrays are better

  1. You can iterate.
  2. You can index
  3. You can pass the reference to other functions (imagine you have 10000 named variables and neet to pass them as parameters)

Some thoughts:

  1. Don't try to solve a problem before it arises, like performance issues in your case.

  2. Don't try to be smarter than the compiler, it's a waste of time.

  3. If, for readability, you need to name your variables foo0 , foo1 , foo2 , instead of foo , bar , baz , better name them foo[3] . Otherwise, foo , bar , baz are more readable than foo0 , foo1 , foo2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM