Programming

Internal Working of C Programming language

c internal working, c compiler internal working

Internal Working of C Programming language

The first question that comes to our mind is what is programming language? So, let's discuss it first.

The programming language is a medium of communication between machine and user. And C is a high level programming language.

Now, let's talk about C programming.

It  is a computer programming language which was developed by Dennis M Ritchie in 1972 at AT&T Bell Laboratories. The C programming language is used to develop system software such as operating system, database management, device driver etc.

As we know that machines only understand machine language (instructions in binary number system) then how will the machine understand any other high level programming language?

Suppose there are two people and one of them knows only hindi and the other one only knows french. The problem here is if the first person wants to communicate with the other how he/she will communicate or vice-versa. They need a translator who can understand both the languages and convert one’s native language to the other’s native language.

In computing, the compiler works as a translator which converts/translates the high level language to machine language (binary number system either 0 or 1).

The code which we write on an editor (like codeblock, dev-cpp, vim, vscode, notepad etc) is called the source code and the file in which our code resides ends with the .c (helloworld.c) extension.

#include 

int main() {
    printf("\n Hello World! \n");
    return 0;
}

helloworld.c

After writing the code our first step starts and the step is called compilation. As we initiate the compiler, our compiler calls a program called preprocessor directives. Preprocessor works on the statements which start with #.

When a C program compiles, it follows different processes given below :

  1. Pre-processing
  2. Compilation
  3. Assembling
  4. Linking

C Internal Working Diagram

Pre-processing - C

All the preprocessor statements (starts with # symbol like #include, #define, #undef etc) solved by the preprocessor directives and return the refined source code (helloworld.c).

A pre-processor does various tasks on source code.

Removing comments

When we write a code, we also add some comments so that the code would be understandable for the user. But they are of no use to a machine. So, the working of the preprocessor is to remove all those comments as they are not written to be executed.

File inclusion

When we write programs, we use header files. Header file contains our own declared functions, definitions of data types, function prototypes, C preprocessor commands. In our source code, we use #include directive which tells the preprocessor to include the contents present in the mentioned header file.

For example, in our programs we use #include

This simply means that #include is telling the preprocessor to include all the contents present in the library file stdio.h. That's why when we use printf(), scanf() functions in our program, we have to use the stdio.h header file.

For example, when we use some mathematical functions in our program, we have to include a math.h header file.

Macro expansion

A macro is a part of code which has been given some name and which will expand when the program is pre-processed.

  • A macro is defined by using #define preprocessor directive.
  • Macro can be object like, function like.
  • Macro is used to define a constant or a function.

For example,

#include 
#define c 299792458 // speed of light

So, whenever we will use c in our program, it will be expanded as 299792458 internally. And we don't have to use 299792458 repeatedly in our program.

Compilation - C

After this the refined code works as an input to the compiler and the compiler returns machine dependent assembly code to the assembler. This is the second step in the process of compilation of a C program.

  • It checks the preprocessed C program for syntax errors.
  • It converts the preprocessed code into assembly code.
  • It optimizes the code for better performance.

Assembling - C

Here assembler  converts assembly code to machine code. The code (machine code) given by the assembler is not executable code because at this stage the body of the library function is not attached yet. The file extension can be changed to '.o' or '.obj'.

Linking - C

So, this machine code further goes to the linker. Linking is the process of collecting and merging codes from different modules into a single one that would be loaded into memory and capable for execution.

The linker will attach the library to resolve library functions and return executable code. The  file with .exe extension. This executable code runs on the machine.

The Address of the first line of the executable file will register to the program counter of the operating system. And then the processor picks the address from here and starts processing it.