3.2. Block Structure and Scope

Time: 00:05:02 | Download: Large Small | Streaming | Slides (PDF)

Like so many features of modern programming languages, blocks were introduced by Algol. A block is a delimited sequence of statements. Programming languages delimit blocks in many ways. For example, Pascal delimits blocks with begin and end, and Python uses indentation. C++ and Java delimit each block with an opening and closing brace: {....}. C++ and Java are block structured languages, which means programs written in these languages consist of blocks, which form function bodies, block or compound statements, classes, namespaces, etc. Understanding blocks is fundamental to understanding the behavior of C++ and Java programs.

Python Programmers

The Python programming language uses indentation to delimit blocks. However, C++ and Java use the curly braces, "{" and "}," as the block delimiter. If you are transitioning from Python to C++, please focus on the use of the braces in the following discussion.

Indentation

All of the examples presented so far have included some small amount of indentation. But the introduction of blocks and structured control statements makes using consistent indentation critical. Indentation makes programs much easier for people to read and understand, but it doesn't affect how the compiler works. The fundamental idea is to reflect the code's logical structure with its physical layout. Specifically, we use indentation to graphically denote the statements that belong to a block or a control statement. Typically, we indent them one level to the right of the containing block or control structure. How much should you indent? Again, like so many other aspects of programming style, this choice is a matter of personal taste. The indentation should be great enough that it is easy for the eye to distinguish the indentation. Two spaces are the generally accepted minimum, but four spaces are very common and is the default for many editors. I like to use a tab character that expands to eight spaces.

Manually maintaining the indentation throughout a program is a tedious and error-prone task. Fortunately, modern program editors, including the one incorporated into Visual Studio, will automatically take care of most program indentation. But there is still one aspect of the indentation that we must address. Rather than inserting individual spaces, many programmers, including myself, like to use the tab key. Modern editors will allow you to set tab stops at various depths. For example, Visual Studio sets tab stops at four spaces, which means that pressing the tab key once increases the indentation by four spaces. My favorite editor, gvim, sets tab stops at 8 spaces, which produces the deep indentation that I have used for more than three decades. All modern editors will allow you to adjust the tab stop settings to whatever depth you prefer.

How much you choose to indent is less important than indenting consistently. If you choose to indent by inserting individual spaces, then always use spaces - don't switch between spaces and a tab character even if the tab stops correspond to the number of spaces used previously. You might wonder, "What difference does it make if I mix spaces and tabs?" It may not make any difference if your code is always viewed with the same editor and if the editor is always configured the same way. But if you mix spaces and tabs and then view the code with a different editor or with a different tab stops setting, then the indentation can become ragged and much harder to read. Code may not look perfect if we switch editors or change tab stops, but it will look much better and be easier to read if spaces and tabs are not mixed.

Blocks

Computer scientists use a metalanguage called BNF to describe a programming language's grammar. Although a detailed study of grammars is beyond the scope of an introductory programming course, we'll look at a small part of the C++ grammar to help us understand the central role that blocks play in many programming languages.

statement:
	compound-statement
	selection-statement
	iteration-statement

compound-statement:
	{ statement-seqopt }
	
statement-seq:
	statement
	statement-seq statement

selection-statement:
	if ( condition ) statement 
	if ( condition ) statement else statement 
	switch ( condition ) statement 

iteration-statement:
	while ( condition ) statement
	do statement while ( expression ) ;
	for ( for-init-statement conditionopt ; expressionopt ) statement
Part of the C++ grammar. Five partial C++ replacement rules adapted from Ellis & Stroustrup (1990, pp. 396-397). A replacement rule consists of one non-terminal (not indented) and set of possible replacements (indented). Each possible replacement consists non-terminals and terminals (in red). Terminal characters must appear verbatim in a program, and the complete grammar includes possible replacements that have only terminals. Non-terminals marked with the "opt" subscript are optional.

Every C++ compiler has a sub-component called a parser that validates a program's syntax by comparing it with the language's grammar. Parsers can work either top-down or bottom-up. When the parser works in the top-down direction, it begins with a single non-terminal. It attempts to match the grammar to the program by successively applying replacement rules until the non-terminals match the program. When the parser works in the bottom-up direction, it attempts to collect the program symbols to form non-terminals, successively applying the replacement rules until the program matches a single non-terminal. The compiler flags a syntax error whenever part of a program does not match the grammar; otherwise, the parser "accepts" the program.

This long discussion has two simple consequences for us. First, a simple statement can be replaced by a compound statement. From the second replacement rule, we see that a compound statement is just a block. Second, complex control statements (e.g., if and switch branches, and while, do, and for loops) are just statements for all of their power and complexity. We'll explore these consequences throughout the chapter.

A block (also known as a code block or a block statement) is an organizational structure formed by an opening { and a closing }. A block is equivalent to a compound statement, so from the second replacement rule above, we can see that a block can be empty, which is not very useful. Although blocks can be empty or have of a single statement, they are most often used to group multiple statements together. The following figures illustrate various aspects of blocks.

#include <iostream>
using namespace std;

int main()
{
	{}
	cout << "Hello, World!" << endl;

	{
		cout << "This is a block" << endl;
	}

	return 0;
}
Uncommon blocks. This example program is syntactically correct and compiles without error. A block forms the body of the main function, which is a very common kind of block, but the other blocks are neither useful or common. Empty blocks generally serve no purpose and should be avoided. (Nevertheless, we'll see a use for empty blocks in a subsequent chapter.) A block can appear at any arbitrary location within a program, and while the code is syntactically correct, it's very uncommon to create a block without a purpose.

The replacement rules presented in Figure 1 suggest that a selection (an if or switch) or iteration (a loop) statement can replace a simple statement. The rules also suggest that a block or compound statement can replace a simple statement in any of these more complex statements. The nesting and grouping that the replacement rules allow (i.e., C++ allows) mean we can create powerful and complex statements in a program. Before we can explore more complex control statements, there are a few more concepts related to blocks that we must understand. Focus your attention on the braces and semicolons in the next figure.

"{" and "}" Optional "{" and "}" Required
if (. . .)
	cout << counter << endl;
Or
if (. . .)
{
	cout << counter << endl;
}
if (. . .)
{
	cout << "Hello, World!" << endl;
	cout << counter << endl;
}
Statements and blocks. Simple statements end with a semicolon, but compound or block statements do not (notice there are no semicolons following the braces). From the Figure 1 rules, we see that selection and iteration statements may include one simple statement without needing braces. But some programmers feel that braces make the code clearer and you may may use them if you wish. Braces are required to make a compound statement when two or more sequential statements are nested inside a single control statement.

Scope

Together, the second and third rules of Figure 1 suggest that blocks may be nested. Furthermore, each block represents a different scope. Scope can apply to many different parts of a program, but it's easiest to understand when applied to variables.

A variable's scope is the part of a program where the variable is visible and usable. When a variable is defined inside of a block, its scope is limited to that block; that is, a variable defined inside a block is not visible or accessible outside of the defining block. A new block can be created just about anywhere in a program, even inside another block. If blocks can be nested, then scopes can also be nested. For the compiler to generate machine code that uses a variable, it must first locate where the variable is defined. When searching for a variable's definition, the compiler begins with the current scope and then searches outward in the surrounding scopes until it locates the variable.

if (. . . )					// -- outermost scope
{
	int	counter;			// defines counter

	if (. . .)				// -- intermediate scope
	{
		int	x;			// defines x
		int	y;			// defines y

		if (. . .)			// -- innermost scope
		{
			counter = x + y;	// uses counter, x, and y
				. . .
		}
	}
}
Three nested blocks define three nested scopes. When generating machine code for the most deeply nested if statement, the compiler searches for the variables counter, x, and y. Unable to find them in the innermost scope, the compiler widens its search to the next level, which is the middle if statement or, in this example, the intermediate scope, where it finds and uses x and y. The compiler must search the next scope out, the outer if statement, or the outermost scope, to find counter.

You can visualize the searching process as a worm eating its way out of an onion. Each layer of the onion represents a scope. The worm begins in the center of the onion (never mind how it got into the center) and eats its way from the center of the onion outward, from one layer or scope to the next.

Although a program can nest blocks arbitrarily deep (and therefore have an arbitrary number levels of scope), C++ has three main or named scopes:

Local scope
Also known as function scope, are variables that are defined inside of a function. Local scope variables obviously include the variables defined in the function body (i.e., the outermost block of the function). But local scope also includes the variables defined in the function's argument list. Functions are covered in greater detail in chapter 6.
Class scope
These are the member variables specified in a class (introduced in chapter 9).
Global scope
These are variables that are not defined inside a class or in a function. Java, being a pure object-oriented language, does not permit global variables.

Global Variables

Variables defined in global scope can, and have, caused untold programming errors and contributed to unnecessarily complex software systems. They make debugging, validation, maintenance, and extension difficult and error-prone. They effectively put an upper limit on the size that software can attain. Global variables should be avoided whenever possible and then only with compelling justification.

Scope And Variable Behavior

In the case of automatic variables, the variable's scope has an impact on when the memory for the variable is allocated and when the variable is initialized. It was claimed previously that the memory was allocated when the variable was defined. That claim was a bit of an oversimplification: the definition arranges for the program to allocate memory when execution enters the variable's scope. That is the essence of an automatic variable: the program automatically allocates memory when the variable comes into scope and automatically deallocates memory when the variable goes out of scope. Furthermore, the program automatically initializes the variable whenever it comes into scope.

  Local Scope Global Scope
 
{
    int	counter = 10;
        .
        .
        .
}
int	counter = 10;

int main()
{
	.
	.
	.
}
Memory Allocation Whenever the variable comes into scope Once, when the program loads
Memory Deallocation Whenever the variable goes out of scope When the program terminates
Variable Initialization Whenever the variable comes into scope Once, when the program loads
Local scope versus global scope. Local scope variables are essentially recreated each time that they come into scope. That means that memory is reallocated and that any initialization statement is re-executed. On the other hand, global variables only "come into scope" once, when the program is loaded into memory, and remain in scope until the program terminates. Memory for a global variable is only allocated once and any initialization statement only executes once.

Uniqueness Rule

Variable names must be unique within each scope, which means that defining multiple variables with the same name in the same scope is not allowed. However, it is possible, but potentially quite confusing, to reuse a variable name in nested scopes:

if ( . . . )
{
	int	counter;			// definition 1

	if ( . . . )
	{
		int	counter;		// definition 2
		    . . . .
		do something with counter	// uses definition 2 counter
	}
}


Ellis, M. A. & Stroustrup, B. (1990). The Annotated C++ Reference Manual. Reading, MA : Addision-Wesley Publishing Company.