10.8. Association

Time: 00:04:53 | Download: Large, Large (CC), Small | Streaming (CC) | Slides (PDF)

Review

Like composition and aggregation, association is a constructive relationship. Like aggregation, C++ implements association with pointers. These similarities make association a general and flexible relationship attractive to analysts because they can use it wherever they use aggregation or (with some compromise) composition. However, the same generality that makes it attractive to analysts also makes it more difficult to program. The relationship's flexibility is a product of its semantics, and its difficulty is a consequence of implementing those semantics.

Two classes, contractor and project, connected by association. The UML association connector is a simple line - both ends are plain and undecorated. — **Building objects related by association**. "A plain association between two classes represents a structural relationship between peers, meaning that both classes are conceptually at the same level, no one more important than the other"¹. Neither peer is superior nor inferior to the other or "special" in any way, making association the only symmetric and bidirectional relationship. Like composition and aggregation, association is a *has a* relationship, but unlike them, it reads equally well in both directions.

The UML reflects the relationship's symmetry with a symmetric connector symbol - both ends are plain and undecorated. We can read the relationship in either direction: "A `contractor` *has a* `project`" or "a `project` *has a* `contractor`."

Programmers implement association with *two* pointer member variables, one in each related class. Maintaining two pointers - initializing both and updating both whenever the relationship changes - is one reason association is more difficult to program. The second problem, in languages like C++, is the need for a forward declaration: `class project;`.

Objects bound by association do not overlap - neither nested nor embedded in the other. Each object has a pointer that points to the other.

UML Classes	Association Classes	Abstract Representation
	class project; // forward declaration class contractor { private: project* theProject; } class project { private: contractor* theContractor; }
(a)	(b)	(c)

Although we can use association in place of aggregation and composition, I believe that they better model some application domain (i.e., "real world") situations than association, and I recommend reserving it for situations that require its symmetry and bidirectional capabilities. Situations requiring objects to send messages in both directions justify the added burdens, major and minor, association places on programmers.

Association Reading Direction

When we create a whole-part relationship with composition or aggregation, it's clear which class is the whole and which is the part. This distinction is often sufficient, and we can name the relationship when it isn't. However, the classes in an association relationship are peers, and how they interact is often less clear. Consequently, it's more common for class designers to name or label associations than other relationships. However, association's bidirectionality can confuse how we read the label.

Two classes, 'contractor' on the left and 'project' on the right, connected by association. The association connector symbol is labeled 'works on.' An arrowhead points to the right, from 'contractor' to project,' indicating the reading direction is to the right. — **Clarifying the association reading direction**. The association relationship is bidirectional, but the label may only read well in one direction when named.

In the first example, the label reads best left to right - as we read it in English and other European languages - from `contractor` to `project`. The association naming syntax allows us to include an arrowhead pointing to the right, indicating the preferred reading direction.

The second example presents the same relationship but reverses the class order. The association name or label still reads best from `contractor` to `project`, but that direction is now right to left - corresponding to some mid- and far-eastern languages. The arrowhead now points to the left, indicating the best reading direction for this orientation.

The same relationship, but with the class order reversed: 'project is on the left and 'contractor' is on the right. The arrowhead points to the left, but still from 'contractor to the project.' — **Clarifying the association reading direction**. The association relationship is bidirectional, but the label may only read well in one direction when named.

In the first example, the label reads best left to right - as we read it in English and other European languages - from `contractor` to `project`. The association naming syntax allows us to include an arrowhead pointing to the right, indicating the preferred reading direction.

The second example presents the same relationship but reverses the class order. The association name or label still reads best from `contractor` to `project`, but that direction is now right to left - corresponding to some mid- and far-eastern languages. The arrowhead now points to the left, indicating the best reading direction for this orientation.

Forward Declarations

The C++ compiler component is the second or middle stage in the C++ compiler system. It reads and translates each preprocessed source code file individually, once, from beginning to end. Consequently, the only program information available to the compiler component comes from the #included header files and the single source code file. However, association is bidirectional, implying that the peers "know about" the other - each class specification references the other class. Since we can't specify both classes first, we need another mechanism to solve the cross-reference problem. That mechanism is a forward declaration.

contractor.h	project.h
class project; // forward declaration class contractor { private: project* theProject; . . . };	class contractor; // forward declaration class project { private: contractor* theContractor; . . . };

Solving association's cross reference problem with forward declarations. A forward declaration is a statement (so it ends with a semicolon) consisting of the class keyword and an identifier, the name of a class. A forward declaration is a "promise" that a programmer will provide more detail about the class in the future. The "promise" allows the compiler to continue processing a file but limits what programmers can put in a class specification.

In this example, when the compiler processes the contractor class specification, it puts information in the object file that the linker or loader uses to connect variable theProject to the project class. Similarly, when it processes the project class specification, it adds information joining theContractor to the contractor class. Association is the only relationship that requires a forward declaration because it is the only bidirectional class relationship.

Limits to forward declarations

Forward declarations only work with pointer variables. For reasons explained below, forward declarations do not work with inheritance or composition - not with any non-pointer variable. Fortunately, correctly structured programs do not require forward declarations for these relationships or aggregation. Nevertheless, forward declarations are necessary to deal with association's bidirectionality, but they cannot circumvent all the problems bidirectionality causes.

Association: Limitations On Forward declaration and Inline Functions

Association, in conjunction with the C++ compiler system, restricts what programmers can put in a class specification more than the other class relationships². Whenever we define a variable in a program, the compiler uses its type to determine its size (i.e., how much memory to allocate to store it). Objects are variables, so the compiler sums an object's member variables' sizes to determine the object's overall size. The summing process depends on the compiler "seeing" the full class specification.

Inheritance and composition entail embedding one object in another (see Instantiating a subclass and Building a whole-part, respectively). The compiler can complete the embedding because the required class organization allows it to "see" and process the superclass specification before the subclass and the part class specifications before the whole class. But this organization isn't possible when we implement association.

Forward declarations solve the class ordering problem because we implement association with pointer variables, and a pointer's size is independent of the size of the data it references (i.e., points to). Forward declarations are unnecessary and unhelpful when programming either inheritance or composition; we could use them with aggregation, but to no advantage. Association also restricts the functions programmers can inline in a class specification.

Two classes, Peer1 and Peer2, connected by association:
Peer1
--
p2 : Peer2*
--
+ bar() : void

Peer2
--
p1 : Peer1*
--
+ foo() : void — **Association restricts some inline functions**. A peer function sending a message to the opposite peer cannot be inlined with the `inline` keyword or by putting the function body in the class specification. In this example, the `Peer1` function `bar` (highlighted in coral), sends the `foo` message to `Peer2`. Although `bar` is a small function, programmers can only prototype it in the class specification (highlighted in blue), not inline it. (UML class diagrams typically don't include the member variables implementing a relationship, but the example includes them to clarify the `bar` function.)

Peer1.h	Peer1.cpp	Peer2.h
class Peer2; class Peer1 { private: Peer2* p2; public: void bar(); };	void Peer1::bar() { p2->foo(); }	class Peer1; class Peer2 { private: Peer1* p1; public: void foo() { ... } };

Function prototypes allow the compiler to validate function calls:

The name of the called function is correct
The number and type of the arguments in the call conform to the number and type of parameters in the definition
The function return type conforms to how the program uses the function call

A prototype also helps the compiler complete any needed type promotions. Alternatively, the compiler must generate code whenever it processes a function definition, including inline functions. Code generation requires a complete function definition and, in the case of a member function, a complete class specification. A forward declaration enters a class name in the compiler's symbol table but lacks details about the class or its functions. Conversely, the compiler component compiles functions written in separate source code files after "seeing" the complete class specifications. After the compiler component translates all the source code files to machine code, the linker binds the function calls to the function's machine code.

Message-passing code becomes increasingly fragile (difficult to use and maintain) with each added inline function. So, a helpful rule of thumb is to minimize inline functions in classes related by association.

Association Summary: Filling In The Table

Association is a constructive relationship conveniently characterized by the phrase "has a," but reading well in both directions. It has many property values in common with composition and aggregation, but it is bidirectional - UML's only bidirectional relationship. The following figure summarizes association's property values; use it to check and complete your entries in one of the blank Class Relationship Tables located at the end of the chapter.

Class Roles. The relationship forms a peer-to-peer relationship - both classes are peers.

Semantics. We can read association in either direction as a has a relationship.
- a contractor has a project
- a project has a a contractor
Directionality. Association is a bidirectional or a two-way relationship. Bidirectionality means that

(a) (b)

(c) (d)
1. The operations may take place in both directions.
2. Either object can send a message to the other.
3. Each object "knows" about the other.
4. Navigating from either object to the other is possible.
Binding Strength. The binding between the two objects is weak or loose because they are independent - they do not overlap or share memory - and are only connected with pointers. The strength of the binding implies the final two characteristics:
- Lifetime. The two objects have an independent lifetime, which means that the two objects are created and destroyed at different times. It also means that the relationship between the two objects is changeable - the program can create or break it whenever it is convenient.
- Sharing. The binding between the two objects is weak or loose enough that neither object has an exclusive relationship with the other, allowing other program objects to share instances of the associated classes.

Implementation.

class peer2;

class Peer1
{
	Peer2* p2;
};

class peer1;

class Peer2
{
	Peer1* p1;
};

Association property values. Like aggregation, C++ implements association with pointer members, but pointers in both classes.

¹ Booch, G., Rumbaugh, J., & Jacobson, I. (2005). The unified modeling language user guide (2nd ed.). Upper Saddle River, NJ: Addison-Wesley.

² Java is not as complex as C++ and doesn't experience these limitations. C++ supports stack and heap objects, allows fundamental-type data throughout a program, and utilizes a one-pass compiler followed by a separate linker or loader process. Alternatively, Java only supports heap objects, limits where programmers use fundamental-type data, and utilizes a two-pass compiler with dynamic class loaders. The Java compiler builds its symbol table during the first pass and generates (virtual) machine code, called byte code, during the second pass. These differences simplify the Java compiler and the organization of Java programs while also creating situations where the Java compiler needlessly recompiles files where the C++ compiler does not. The unnecessary compilations are irrelevant for small programs with small files but can significantly increase the development time for large programs.