1.4. Machine Code

Time: 00:04:00 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

Digital computers don't understand the languages that humans speak. Instead, they use a binary language called machine code or machine language. Machine code consists of a sequence of simple computer instructions. Each instruction consists of one or more integers, but we can conveniently view them as a string of binary digits or bits (i.e., 1's and 0's). Different computers typically "speak" or "understand" different machine languages. For example, one computer may represent the ADD operation as 10011111 while another might represent the same operation as 000110. The size of machine code instructions may also vary from one computer to another: 32-bit instructions are still in use, but 64-bit instructions are now the most common.

Furthermore, when a program runs, the operating system (e.g., Windows, Linux, macOS, etc.) acts as a host environment that provides services to the program. These services include essential support such as keyboard, screen, and hard drive access. Unfortunately, how the program accesses those services differs from one operating system to the next. As a result of the differences between machine languages and operating system requirements, programs written in machine language are more focused on the system running the program than on how the program solves a problem. Furthermore, it also means it isn't possible to move machine code between different computers without providing a translation service - usually as a virtual machine.

Writing programs in machine language is slow, tedious, and error-prone. Today, we write most programs in higher-level programming languages that focus more on the problem and less on the system (the hardware and the operating system) running the program (see Figure 2). But computers can't (with rare exceptions) directly execute programs written in high-level languages, so there must be some way of translating a program written in a high-level language into machine language. Two kinds of computer programs perform the necessary translation: compilers and interpreters.

A compiler is a program that translates other programs written in a high-level programming language like C or C++ into machine code or machine language. Some languages, such as Java and C#, take a different route. Compilers for these languages translate the high-level source code into an intermediate form (a representation that lies somewhere between the high-level and actual machine code) called virtual machine code. The virtual machine code then becomes the input to another program called an interpreter or virtual machine (VM), a program that simulates a hardware CPU.

Other languages, such as Javascript and Perl, are completely interpreted. These languages don't use compilers at all. The interpreter reads the source code, written in a high-level language, and interprets the instructions one at a time. That is, the interpreter itself carries out each instruction in the program.

An illustration showing a compiler reading a C++ source code file, translating the C++ code into machine code, and writing the machine code out to an executable file.
Converting a program from source to machine code. A compiler reads a program written in a high-level language, like C++, from a source code file. The compiler translates or converts the source code to executable machine code and writes it to a program file. On Windows computers, executable files produced by the C++ compiler end with .exe; other operating systems follow different naming conventions. The operating system loads the machine code into main memory, where the hardware runs it directly without further processing.
An illustration of a compiler reading a Java source code file, translating the Java code into virtual machine code, and saving it to a new file. The virtual machine, another program, reads the virtual machine code from the new file and interprets it by carrying out the operations specified by the virtual machine code.
Compiling and running a hybrid-language program. Some languages, like Java and C#, are called hybrid languages because they use both a compiler and a virtual machine. They first compile the source code to virtual machine code, that is, machine code for a virtual computer (a computer program simulating a non-existent computer system - an interpreter or virtual machine). After compiling the source code, a virtual machine (VM) executes the code by simulating the actions of a real computer. The operating system loads the VM into main memory and runs it; it is the VM that reads and runs the program's virtual machine code.
The picture illustrates a Javascript source code file being read and processed by an interpreter.
Running an interpreted-language program. Languages like Javascript and Perl do not compile the source code at all. Like the hybrid languages (Java and C#), the operating system runs the interpreter or VM. The interpreter reads the source code file and executes the program one statement at a time without translating the whole program to any other language. Web browsers incorporate interpreters for some languages (like Javascript), while the operating system runs the interpreters for other languages (like Perl) as application programs.

Each approach to running a program written in a high-level programming language has advantages and disadvantages. Programs written in fully compiled languages (e.g., C and C++) execute faster than programs written in partially compiled languages (e.g., Java and C#) and run much faster than programs written in fully interpreted languages (e.g., Javascript and Perl). To give some idea of the difference in performance, let's say that a C++ program, once compiled, executes in time 1. A program in a hybrid language (compiled and interpreted) will generally run in time 3 to 10. In a purely interpreted language, the same program runs in a time of about 100. Contemporary versions of the Java and C# VMs use a just in time (JIT) interpreter that compiles some of the virtual code to machine code while processing it. JIT processors reduce run time to about 1.5 times that of purely compiled language systems. The Python programming language is a bit different.

Python programs have some fully interpreted parts, but these parts are a small portion of the overall program and have minimal impact on the overall runtime. The Python libraries, where the program spends most of its time, are written in C and run very fast. So, Python programs run almost as fast as an equivalent C program.

Conversely, once we compile a program written in purely compiled languages, we can't easily move the resulting executable machine code to a different platform (e.g., you can't run a Windows program on an Apple computer). In contrast, we can easily move programs we write in interpreted languages between different computers.

Interpreted programs are portable because they run on a VM or interpreter. The interpreter is the running program from the hardware and operating system perspectives. We write interpreters and VMs in purely compiled languages, so they are not portable, but the programs they run are. Once we install the interpreter on a system, we can move interpretable programs to the system and run them without further processing.