简体   繁体   中英

Why was mixing declarations and code forbidden up until C99?

I have recently become a teaching assistant for a university course which primarily teaches C. The course standardized on C90, mostly due to widespread compiler support. One of the very confusing concepts to C newbies with previous Java experience is the rule that variable declarations and code may not be intermingled within a block (compound statement).

This limitation was finally lifted with C99, but I wonder: does anybody know why it was there in the first place? Does it simplify variable scope analysis? Does it allow the programmer to specify at which points of program execution the stack should grow for new variables?

I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.

In the very beginning of C the available memory and CPU resources were really scarce. So it had to compile really fast with minimal memory requirements.

Therefore the C language has been designed to require only a very simple compiler which compiles fast. This in turn lead to " single-pass compiler " concept: The compiler reads the source-file and translates everything into assembler code as soon as possible - usually while reading the source file. For example: When the compiler reads the definition of a global variable the appropriate code is emitted immediately.

This trait is visible in C up until today:

  • C requires "forward declarations" of all and everything. A multi-pass compiler could look forward and deduce the declarations of variables of functions in the same file by itself.
  • This in turn makes the *.h files necessary.
  • When compiling a function, the layout of the stack frame must be computed as soon as possible - otherwise the compiler had to do several passes over the function body.

Nowadays no serious C compiler is still "single pass", because many important optimizations cannot be done within one pass. A little bit more can be found in Wikipedia .

The standard body lingered for quite some time to relax that "single-pass" point in regard to the function body. I assume, that other things were more important.

It was that way because it had always been done that way, it made writing compilers a little easier, and nobody had really thought of doing it any other way. In time people realised that it was more important to favour making life easier for language users rather than compiler writers.

I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.

Don't assume that the language designers set out to restrict the language. Often restrictions like this arise by chance and circumstance.

I guess it should be easier for a non-optimising compiler to produce efficient code this way:

int a;
int b;
int c;
...

Although 3 separate variables are declared, the stack pointer can be incremented at once without optimising strategies such as reordering, etc.

Compare this to:

int a;
foo();
int b;
bar();
int c;

To increment the stack pointer just once, this requires a kind of optimisation, although not a very advanced one.

Moreover, as a stylistic issue, the first approach encourages a more disciplined way of coding (no wonder that Pascal too enforces this) by being able to see all the local variables at one place and eventually inspect them together as a whole. This provides a clearer separation between code and data.

Requiring that variables declarations appear at the start of a compound statement did not impair the expressiveness of C89. Anything that one could legitimately do using a mid-block declaration could be done just as well by adding an open-brace before the declaration and doubling up the closing brace of the enclosing block. While such a requirement may sometimes have cluttered source code with extra opening and closing braces, such braces would not have been just noise--they would have marked the beginning and end of variables' scopes.

Consider the following two code examples:

{
  do_something_1();
  {
    int foo;
    foo = something1();
    if (foo) do_something_1(foo);
  }
  {
    int bar;
    bar = something2();
    if (bar) do_something_2(bar);
  }
  {
    int boz;
    boz = something3();
    if (boz) do_something_3(boz);
  }
}

and

{
  do_something_1();

  int foo;
  foo = something1();
  if (foo) do_something_1(foo);

  int bar;
  bar = something2();
  if (bar) do_something_2(bar);

  int boz;
  boz = something3();
  if (boz) do_something_3(boz);
}

From a run-time perspective, most modern compilers probably wouldn't care about whether foo is syntactically in scope during the execution of do_something3() , since it could determine that any value it held before that statement would not be used after. On the other hand, encouraging programmers to write declarations in a way which would generate sub-optimal code in the absence of an optimizing compiler is hardly an appealing concept.

Further, while handling the simpler cases involving intermixed variable declarations would not be difficult (even a 1970's compiler could have done it, if the authors wanted to allow such constructs), things become more complicated if the block which contains intermixed declarations also contains any goto or case labels. The creators of C probably thought allowing intermixing of variable declarations and other statements would complicate the standards too much to be worth the benefit.

Back in the days of C youth, when Dennis Ritchie worked on it, computers ( PDP-11 for example) have very limited memory (eg 64K words), and the compiler had to be small, so it had to optimize very few things and very simply. And at that time (I coded in C on Sun-4 / 110 in the 1986-89 era), declaring register variables was really useful for the compiler.

Today's compilers are much more complex . For example, a recent version of GCC (4.6) has more 5 or 10 million lines of source code (depending upon how you measure it), and does a big lot of optimizations which did not existed when the first C compilers appeared.

And today's processors are also very different (you cannot suppose that today's machines are just like machines from the 1980s, but thousands of times faster and with thousands times more RAM and disk). Today, the memory hierarchy is very important: cache misses are what the processor does the most (waiting for data from RAM). But in the 1980s access to memory was almost as fast (or as slow, by current standards) than execution of a single machine instruction. This is completely false today: to read your RAM module, your processor may have to wait for several hundreds of nanoseconds, while for data in L1 cache, it can execute more that one instruction each nanosecond.

So don't think of C as a language very close to the hardware: this was true in the 1980s, but it is false today.

Oh, but you could (in a way) mix declarations and code, but declaring new variables was limited to the start of a block. For example, the following is valid C89 code:

void f()
{
  int a;
  do_something();
  {
    int b = do_something_else();
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM