Understanding the Execution Stages of a C Program: A Detailed Guide
- Encode AND Decode
- Oct 15, 2024
- 4 min read
When you write a C program, it goes through multiple stages before it runs and gives you the output. In this blog, we will take an in-depth look at all the stages involved in C program execution: preprocessing, compilation, assembly, linking, loading, and execution.
Let’s break down these stages using a simple example and explore how each step works:
#include <stdio.h>
#define PI 3.14159
int main() {
printf("Hello, World!\n");
printf("Value of PI: %f\n", PI);
return 0;
}
In the above program:
We include the standard I/O library (stdio.h) for the printf function.
We define a macro PI with a value of 3.14159.
The program prints a message and the value of PI.
Stage 1: Preprocessing
The first step in compiling any C program is preprocessing. The preprocessor handles all preprocessor directives (lines starting with #), replacing macros and including files before sending the code to the compiler. The main tasks of the preprocessor are:
Command to Run Preprocessing:
gcc -E hello.c -o hello.i
Preprocessed Output (hello.i): The preprocessed file contains the expanded source code, where all macros and includes have been resolved. Here's a brief snapshot of how hello.i looks like:
Notice how PI has been replaced with 3.14159, and the content from stdio.h (currently not in the image) has been also included.
Stage 2: Compilation
In the compilation stage, the preprocessed code is transformed into assembly language. This is where the compiler checks your code for syntax errors and generates an equivalent low-level representation (assembly code) that is specific to the target machine architecture (e.g., ARM, x64).
Command to Run Compilation:
gcc -S hello.i -o hello.s
Compiled Output (hello.s): The result is an assembly code file. It contains the translated instructions in a low-level format. Here's an output of the hello.s (for x64 architecture):
Stage 3: Assembly
The assembly stage converts the assembly language code into machine code or an object file. Machine code is a binary representation of the instructions that the processor will execute.
Command to Run Assembly:
gcc -c hello.s -o hello.o
Assembled Output (hello.o): The object file contains machine-readable binary instructions. You won’t see human-readable text here, but tools like objdump can help you inspect it. This file is not yet executable; it’s a partial product that will be combined with other object files and libraries in the next stage.
Disassemble the Object File (Optional): If you want to see the disassembled version of the object file (i.e., converting machine code back into assembly), you can use the objdump tool.
Run the following command in your terminal to disassemble the object file:
objdump -d hello.o
This will display the assembly instructions corresponding to the machine code.
Stage 4: Linking
In the linking stage, the object file is combined with libraries (like libc for printf) and other necessary components to create an executable file. The linker resolves any external function calls and variable references.
Command to Run Linking:
gcc hello.o -o hello
Linked Output (hello): The linker takes care of combining your object files into a single, executable file. In this case, the output file is named hello.exe. This file contains all the binary code and references needed to run the program.
If you used any dynamic libraries (like the C standard library), the linker also sets up dynamic references to those libraries.
The output of the linking stage is an executable file (typically without an extension on Linux/macOS, or with .exe on Windows).
Stage 5: Loading
The loading stage occurs when you run the program. The loader (part of the operating system) loads the executable into memory, prepares the runtime environment, and sets up the memory spaces, such as:
Code segment: Stores the compiled code.
Data segment: Contains initialized global variables.
Stack: Used for function calls and local variables.
Heap: Used for dynamic memory allocation.
The loader also sets the entry point, which is the address of the main() function, where the program starts execution.
Note:
In a microcontroller (such as STM32F407), the process is different because there is no operating system (or at least no OS loader like on desktop systems). Here’s how it works:
The linker handles the allocation of memory segments during the build process (not at runtime). It produces a firmware image that contains all the segments already arranged for the microcontroller's memory.
After the firmware is flashed into the microcontroller's flash memory, the microcontroller’s startup process involves:
Copying the code and data segments to the appropriate places in memory (e.g., flash to SRAM).
Setting up the stack pointer to point to the top of the stack.
Setting the program counter to the reset vector (usually the start of the main() function or the reset handler).
The startup code in a microcontroller initializes these sections (often in assembly), and the entry point is defined at the reset handler, which eventually calls main().
Stage 6: Execution
Finally, the program is ready to be executed by the CPU. The CPU starts running the instructions, beginning with the main() function. During execution:
"Hello, World!" is printed using the printf function.
The value of PI (3.14159) is printed.
The program returns 0, signaling successful execution to the operating system.
Output:
Hello, World!
Value of PI: 3.141590
Summary of Stages
Stage | Description | Input | Output | Tool |
Preprocessing | Process directives, include headers, replace macros | Source code .c | Preprocessed .i | Preprocessor (gcc -E) |
Compilation | Convert preprocessed code into assembly language | Preprocessed .i | Assembly code .s | Compiler (gcc -S) |
Assembly | Translate assembly into machine code (object code) | Assembly .s | Object file .o | Assembler (gcc -c) |
Linking | Link object files and libraries, produce executable | Object .o | Executable file | Linker (gcc) |
Loading | Load the executable into memory and prepare for execution | Executable file | Program in memory | Loader (OS) |
Execution | The CPU executes the program's instructions | Loaded program | Program output/result | CPU |
Comments