1.6. Terminology

Time: 00:07:16 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

Every discipline has its own specialized terminology that allows practitioners to communicate quickly and effectively. Learning computer science terminology will help you communicate with your colleagues quickly and accurately. I use these terms throughout the textbook, and your instructor may use them in exam questions and programming assignments. If you are comfortable using the terms, you will find it easier to understand subsequent text sections and less frustrating to complete your projects.

Here are a few terms to add to your vocabulary:

Statement

C++, like Java, is an imperative programming language. Programs written in imperative languages consist of a sequence of statements; each statement is an instruction that tells the computer to do something. All C++ and Java terminate all statements with a semicolon.

It's possible to place multiple statements on a single line because each statement ends with a semicolon. But, experienced programmers generally place each statement on a line by itself, making the code easier to read. Sometimes, a statement may be too long to fit on a single line (modern text editors support horizontal scrolling, but consider what the code will look like if printed on paper), so it's common practice to break these long statements into multiple lines.

Correct, but not preferred	Best Practice
statement 1; statement 2; statement 3;	statement 1; statement 2; statement 3;

Statements. One statement = one high-level computer instruction, terminated by a semicolon. Multiple statements may appear on one line, or a single statement may span multiple lines, but in most cases, the best practice is to place each statement on a line by itself.

The overarching rule is to make the code easy for people to read and understand. In some rare cases, the first example in Figure 1 may improve readability and is preferred. Break long statements at whitespace and indent subsequent lines until the statement ends. It's easier for a reader to read a long statement that spans multiple lines than forcing the reader to scroll the display area.

Expression

An expression is a fragment of code that produces a temporary value. The computer must evaluate the expression to obtain the value; that is, the computer carries out the calculations or operations in the code fragment. Expressions are part of larger statements; the computer discards the expression value when the statement ends (i.e., expressions have statement scope). Some examples of expressions:

5

A constant. The expression value is the value of the constant.

counter

A variable. The expression value is the value currently saved in the memory named by the variable. The compiler generates code to load the saved value, evaluating the variable.

sqrt(5)

A function call. Any function that does not have a return type of void forms an expression. The expression value is the value that the function returns. "sqrt" is a function that calculates and returns the square root of its argument.

-n

counter + 5

angle < 180

Expressions formed with operators. C++ forms expressions recursively, meaning that programmers create large expressions by combining small sub-expressions. Evaluating an expression does not change the value of any sub-expression. So, when the computer evaluates -n, the expression value is the value stored in n with the sign reversed, but the value stored in n is unchanged.

sqrt(pow(a, 2) + pow(b, 2))

Expressions with many sub-expressions. Complex expressions are evaluated from the inside out (think of a worm eating its way out, layer by layer, from the center of an onion) - that is, function calls have a higher precedence than the arithmetic operators. The "pow" function raises a number to a power: pow(b,e) = b^e. So, if we ignore the evaluation of the individual variables for simplicity, the compiler evaluates the example expression in steps:

pow(a,2)
pow(b,2)
pow(a,2) + pow(b,2)
sqrt(pow(a,2) + pow(b,2))

Expression examples. Expressions span a spectrum from simple to arbitrarily complex. An expression is only a part of a statement, so it cannot stand alone (i.e., an expression appearing outside of a statement will not compile). Programs evaluate an expression to produce a value that only exists while the program runs the statement - the program loses the value when the statement ends.

Declaration

Most Java textbooks state that a variable must be declared before it can be used. Alternatively, C++ textbooks state that a variable must be defined before using it. So, what's the difference between declaring and defining a variable? In Java, there is no difference (and some Java textbooks will use the term "defined" in place of "declared"), but there is a difference in C++.

A declaration "introduces" a programmer-created name or identifier (called a "symbol" in the language of computer science) to the compiler. The compiler stores the symbol in a temporary data structure called the symbol table. When the symbol is a variable, the compiler also records the variable's data type and where it appears in the program. When the symbol is a function, the compiler records its return type and the number and type of its parameters. The compiler uses the symbol table while translating C++ code to machine code.

The reason that C++ makes a distinction between "declaration" and "definition" and Java does not is due to differences in the compiler implementations. Java compilers use a "two-pass" strategy, meaning the compiler reads each program file twice. The compiler builds the symbol table during the first pass and generates (virtual) machine code during the second pass. In contrast, the C++ compiler uses a "one-pass" strategy - it builds the symbol table and generates machine code in the same pass. The one-pass approach requires that every symbol be defined before the compiler uses it for code generation.When a programmer tries to do something with a symbol, the compiler must "know" what the symbol represents. Some C++ statements serve as both a definition and a declaration, further complicating understanding the distinction between the terms. We'll see some examples of these statements as we explore what it means to "define" something below.

Functions can also be declared, which is important when programs span multiple files. Function declarations, called function prototypes or just prototypes, are explored in more detail in chapter 6.

Index	Symbol	Feature	Type	Arguments
0	Counter	Variable	int
1	sqrt	Function	double	2
2	x	Variable	double
3	pow	Function	double	4, 5
4	x	Variable	double
5	y	Variable	double

An abstract symbol table (1). An illustration of what a symbol table might look like and some of the information that it might contain. Declarations do not create the named feature, so the address remains empty. Function arguments are referred to by a corresponding row number or table index.

A declaration introduces a named programming element or symbol to the compiler, which places it in its symbol table. The compiler uses the information contained in the symbol table to generate machine code.

Definition

Unlike declarations, variable and function definitions allocate or set aside memory to hold the data named by the variable or the machine code for a function. If a definition statement is the first time the compiler "sees" the symbol, then the statement serves as both a declaration and a definition. For example:

int counter;

A variable definition. A variable definition must include two elements:

A data type
A unique variable name or identifier

The example is a valid variable declaration in Java and is a valid variable definition in C++; the statement may also serve as a declaration in C++ if this is the first time that the compiler "sees" the name "counter."

The following figure illustrates two possible ways to define a sequence of variables: A programmer may define multiple variables of the same data type as a comma-separated list in a single statement or as a sequence of separate statements.

Correct, but not preferred	Best Practice
double x, y, z;	double x; double y; double z;

Defining multiple variables. Although it is possible to define multiple variables in a single statement, most programmers believe that defining each variable in a separate statement enhances program clarity and readability and is the preferred syntax. To avoid confusion later, you should notice that defining a variable without initializing it is possible, but it's impossible to initialize a variable in a declaration.

If the programmer knows the initial value of the variable or can calculate it at the time of the variable definition, then the programmer can also initialize the variable in the same statement that defines it:

double x = 0;
double y = 3.1419;
double z = x + y;

Combined variable definition and initialization. To "initialize" a variable means storing its first or initial value. C++ allows programmers to define and initialize a variable in the same statement. Although combined in a single statement, the statement has two distinct operations: a variable definition and a variable initialization. When defining and initializing multiple variables, define and initialize each variable in a separate statement.

Index	Symbol	Address	Feature	Type	Arguments
0	Counter	0x12345678	Variable	int
1	sqrt	0xAF56C730	Function	double	2
2	x	0x0C83E0C5	Variable	double
3	pow	0x0B31A84D	Function	double	4, 5
4	x	0x561AC820	Variable	double
5	y	0x873291FE	Variable	double

An abstract symbol table (2). An illustration of what a symbol table might look like and some of the information that it might contain. Definitions create variables and functions that occupy memory while a program runs. One of the essential bits of information stored in the symbol table is the address where the variable or function is stored in memory, which allows the compiler to map symbols to memory addresses. Addresses are traditionally represented in hexadecimal, highlighted in the illustration.

A definition causes the compiler to generate code that allocates or uses memory to hold the defined programming element.

Definitions and Declarations In Multi-File Programs

We can write small programs in a single file. In this case, there's little need for variable declarations, but even a small program may still require function declarations, as we will see in Chapter 6. It's a common practice to spread large, complex programs over multiple files. If the program defines an element (a variable, function, class, or structure) in one file, a declaration is necessary to use the element in other program files. Even so, variable declarations are rare and only needed when all of the following are true:

The variable is defined in global scope (scope is the location in a program where a named element is visible or accessible, so a global variable is defined outside of a class or function, and the name is visible in any file that defines or declares it)
The program spans multiple files
The variable is defined in one file but used in another.

Definition / file1.cpp	Declaration / file2.cpp
int counter; // variable defined in global scope	extern int counter; // variable declaration

Variable declarations. For a program to use a variable defined in one file, it must declare it with the extern keyword in the using files.

Note

Global variables are error-prone, making program verification and maintenance difficult, and are therefore generally avoided in general practice. Global variables are still occasionally necessary in C programs. In rare cases, they are still necessary for C++ programs but typically not in general C++ applications.

To better understand the purpose of the declaration and the extern keyword in the example, recall that the compiler component (the second stage in the full compiler system) processes each source code (i.e., .cpp) file one at a time. So, the compiler would "see" the definition of "counter" when it processes file1. However, without the extern statement, the name "counter" is otherwise unspecified and unknown when the compiler encounters it while processing file2.

The extern statement creates a pure declaration, which places "counter" in the compiler's symbol table but does not cause it to allocate any memory. Programs may declare a variable any number of times (if the declarations are the same) but must define it exactly once. It is one of the linker's (Figure 4) tasks to connect all of a variables declarations to its one definition. Finally, note that a definition may have an initialization: int counter = 100; but that a declaration may not.