3.2. Block Structure and Scope

Time: 00:05:14 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

Like so many features of modern programming languages, blocks were introduced by Algol. A block is a delimited sequence of statements. Programming languages delimit blocks in many ways. For example, Pascal delimits blocks with begin and end, and Python uses indentation. C++, Java, C#, and many other programming languages delimit blocks with an opening and closing brace: {....}. These are block structured languages, meaning programs written in these languages consist of blocks, which form function bodies, block or compound statements, classes, namespaces, etc. Understanding blocks is fundamental to understanding C++'s behavior.

Python Programmers

The Python programming language uses indentation to delimit blocks. However, C++ and Java use the curly braces "{" and "}" as the block delimiter. If you are transitioning from Python to C++, please focus on how C++ uses the braces in the following discussion.

Indentation

The Hello World example presented three accepted programming styles illustrating where programmers place the braces defining a function's body - a block. Significantly, all the examples indented the body's statements. Control statements and their attendant blocks make consistent indentation critical. Indentation does not form any part of the C++ syntax, and the compiler ignores it. Nevertheless, indentation is valuable because it makes code easier for people to read and understand. The fundamental idea is to reflect the code's logical structure with its physical layout.

for (int i = 0; i < 10; i++)
    cout << "Hello, World" << endl;
cout << "Goodbye, World" << endl;

for (int i = 0; i < 10; i++)
cout << "Hello, World" << endl;
cout << "Goodbye, World" << endl;

for (int i = 0; i < 10; i++)
    cout << "Hello, World" << endl;
    cout << "Goodbye, World" << endl;

(a)

(b)

(c)

Indentation reflects statement logic. Programmers typically write subordinate statements - those "belonging" to the control - indented to the right of the controlling statement. The statement-terminating semicolons implement the program's structure but are easily overlooked. Indentation visually suggests the program's structure. The three code fragments are logically equivalent, but most programmers consider (a) easier to understand.

Best-practice indentation reflects the program's logical structure, helping readers, in this example, see that the program prints Hello, world! once for each loop iteration and Goodbye, World only once.
Without indentation, it's more difficult for readers to discern which cout statements are subject to the control (for) and which are not.
Incorrect indentation makes it seem like the control drives both cout statements, but only the first one is part of the loop.

How and how much programmers indent is a matter of personal style, less important than indenting consistently. Always indent the same number of spaces, and don't intermix spaces and tab characters, even if the tab stops correspond to the number of spaces you use. You might wonder, "What difference does it make if I mix spaces and tabs?" It may not make any difference if every always views the code with the same editor and if the editor is always configured the same way. But if you mix spaces and tabs and view the code with a different editor or tab-stop settings, the indentation can become ragged and much harder to read. The code may not look perfect if readers switch editors or change tab stops, but it will look much better and be easier to read if you don't mix spaces and tabs.

Blocks

Computer scientists often define a programming language's grammar like C++ with a metalanguage expressed in BNF. Although a detailed study of grammars is beyond the scope of an introductory programming course, we'll look at a small part of the C++ grammar to help us understand the central role of blocks and some unexpected language features.

(i)	statement: exression-statement compound-statement selection-statement iteration-statement
(ii)	expression-statement: expression_opt ;
(iii)	compound-statement: { statement-seq_opt }
(iv)	statement-seq: statement statement-seq statement
(v)	selection-statement: if ( condition ) statement if ( condition ) statement else statement switch ( condition ) statement
(vi)	iteration-statement: while ( condition ) statement do statement while ( expression ) ; for ( ... ) statement
	(a)	(b)

Partial C++ grammar. Six partial C++ replacement rules adapted from Ellis & Stroustrup (1990, pp. 396-397). A replacement rule consists of one non-terminal (not indented) and one or more possible replacements or substitutions (indented). Each replacement consists of non-terminals and terminals (in red). We can replace a non-terminal with one of its possible replacements, but terminals must appear verbatim in a program. The complete grammar includes some replacements consisting of only terminals, allowing the replacement process to end. Non-terminals marked with the "opt" subscript are optional, and the ellipses represent detail omitted for simplicity.

Some of C++'s statement rules are expressed in BNF. The rules describe ways to form valid C++ statements.
A graphical representation of rules i through iv, illustrating the first two statement replacements.

We must make four significant observations in the statement grammar:

C++ makes statements and expressions more interchangeable than many other languages (rules i & ii). Some "expression" replacements return to a "statement." Nevertheless, the text generally maintains the distinction between statements and expressions presented in Chapter 1 for clarity.
Branches and loops may have single statements (rules v & vi).
Programs may replace single statements with compound statements (rule iii).

#include <iostream>
using namespace std;

int main()
{
	{}
	cout << "Hello, World!" << endl;

	{
		cout << "This is a block" << endl;
	}

	return 0;
}

Uncommon blocks. This example program is syntactically correct and compiles without error. A block forms the body of the main function. Forming function bodies is a frequent use of blocks, but the other blocks are neither useful nor common. Empty blocks generally serve no purpose and should be avoided. (Nevertheless, we'll see a use for empty blocks in a subsequent chapter.) A block can appear at any arbitrary location within a program, and while the code is syntactically correct, it's very uncommon to create a block without a purpose.

The replacement rules presented in Figure 1 suggest that a selection (an if or switch) or iteration (a loop) statement can replace a simple statement. The rules also suggest that a block or compound statement can replace a simple statement. The nesting and grouping the replacement rules allow (i.e., C++ allows) mean we can create powerful and complex statements in a program. Before exploring more complex control statements, we must understand a few more block-related concepts. Focus your attention on the braces and semicolons in the next figure.

"{" and "}" Optional	"{" and "}" Required
if (. . .) cout << counter << endl; Or if (. . .) { cout << counter << endl; }	if (. . .) { cout << "Hello, World!" << endl; cout << counter << endl; }

"{" and "}" Optional

"{" and "}" Required

if (. . .)
	cout << counter << endl;

if (. . .)
{
	cout << counter << endl;
}

if (. . .)
{
	cout << "Hello, World!" << endl;
	cout << counter << endl;
}

Statements and blocks. Simple statements end with a semicolon, but compound or block statements do not (notice there are no semicolons following the braces). The Figure 2 rules show that selection and iteration statements may include a simple statement without requiring braces. But some programmers feel that braces clarify the code, and you may use them if you wish. Braces are required to make a compound statement when two or more statements are nested inside a single control statement.

Block Scope

Together, the second and third rules of Figure 1 suggest that blocks may be nested. Furthermore, each block represents a different scope. Scope can apply to many program parts, but it's easiest to understand when applied to variables.

A variable's scope is the part of a program where the variable is visible and usable. When a variable is defined inside of a block, its scope is limited to that block; that is, a variable defined inside a block is not visible or accessible outside of the defining block. A new block can be created anywhere in a program, even inside another block. If blocks can be nested, then scopes can also be nested. For the compiler to generate machine code that uses a variable, it must first locate where it is defined. When searching for a variable's definition, the compiler begins with the current scope and then searches outward in the surrounding scopes until it locates the variable.

if (. . . )					// -- outermost scope
{
	int	counter;			// defines counter

	if (. . .)				// -- intermediate scope
	{
		int	x;			// defines x
		int	y;			// defines y

		if (. . .)			// -- innermost scope
		{
			counter = x + y;	// uses counter, x, and y
				. . .
		}
	}
}

Block scope example. When generating machine code for the most deeply nested if statement, the compiler searches for the variables counter, x, and y. Unable to find them in the innermost scope, the compiler widens its search to the next level, which is the middle if-statement or, in this example, the intermediate scope, where it finds and uses x and y. The compiler must search the next higher scope, the outer if-statement, or the outermost scope, to find counter.

You can visualize the searching process as a worm eating its way out of an onion. Each layer of the onion represents a scope. The worm begins in the center of the onion (never mind how it got into the center) and eats its way from the center of the onion outward, from one layer or scope to the next.

Local scope: Also known as function scope, are variables defined inside a function. Local scope variables include the variables defined in the function body (i.e., the outermost block of the function). But local scope also includes the variables defined in the function's parameter list. Chapter 6 covers functions in greater detail in .
Class scope: These are the member variables specified in a class (introduced in Chapter 9).
Global scope: Variables not defined inside a class or in a function. Java, being a pure object-oriented language, does not permit global variables.

C++ named scopes. Although a program can nest blocks arbitrarily deep (and therefore have an arbitrary number of scope levels), C++ has three main or named scopes. The names facilitate discussions about scope.

Global Variables

Global variables have caused untold programming errors and contributed to unnecessarily complex software systems. They make debugging, validation, maintenance, and extension challenging and error-prone. They effectively put an upper limit on the size that software can attain. Global variables should be avoided whenever possible and used only with compelling justification. Lacking class scope, they are sometimes necessary in C programs, but their need in general C++ programs is rare.

Scope And Variable Behavior

In the case of automatic variables, the variable's scope affects when the program allocates and deallocates its memory, and when it initializes the variable. The program allocates the variable's memory, initializes it when it comes into scope, and deallocates it when it goes out of scope. Previously, I claimed that the program allocated memory when a variable was defined. That claim was an oversimplification: the definition arranges for the program to allocate memory when execution enters the variable's scope. That is the essence of an automatic variable: the program automatically allocates memory when the variable comes into scope and automatically deallocates memory when the variable goes out of scope. Furthermore, the program automatically initializes the variable whenever it comes into scope.

	Local Scope	Global Scope
	{ int counter = 10; . . . }	int counter = 10; int main() { . . . }
Memory Allocation	Whenever the variable comes into scope	Once, when the program loads
Memory Deallocation	Whenever the variable goes out of scope	When the program terminates
Variable Initialization	Whenever the variable comes into scope	Once, when the program loads

Local scope versus global scope. In a sense, programs create local scope variables each time they come into scope. That means that programs can reallocate the memory when the variable goes out of scope, losing any saved data. So, the program must reinitialize the variable whenever it comes into scope. On the other hand, global variables only "come into scope" once when the operating system loads the program into memory, and they remain in scope until the program terminates. Programs allocate and initialize memory for global variables once, retaining the saved values throughout program execution.

Uniqueness Rule

Variable names must be unique within each scope, meaning defining multiple variables with the same name in the same scope is a compile-time error. However, it is possible, but potentially quite confusing, to reuse a variable name in nested scopes:

if ( . . . )
{
	int	counter;			// definition 1

	if ( . . . )
	{
		int	counter;		// definition 2
		    . . . .
		do something with counter	// uses definition 2 counter
	}

	do something with counter		// uses definition 1 counter
}

Ellis, M. A. & Stroustrup, B. (1990). The Annotated C++ Reference Manual. Reading, MA : Addision-Wesley Publishing Company.