-g #enable debugging information -O0 #(Letter 'o' followed by zero) do not optimize -fno-builtin #do not use builtin function optimization
The source code will be compiled under a Linux x86_64 environment and a ELF (Executable and Linkable Format) file is the resulting binary.
For analyzing the executable in a readable way, we can use the 'objdump' and 'readelf' commands. With 'objdump' we can use an argument for showing the header information for the file (-f), display summary information of the file divided by section (-s) or disassemble the executable (-d). Having the debug flag when compiling also helps when using the option to show the C source code (--source), along with the instructions in assembly (-d implied).
$ objdump --source hello
========================
00000000004003e0 <printf@plt-0x10>:
4003e0: ff 35 22 0c 20 00 pushq 0x200c22(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
4003e6: ff 25 24 0c 20 00 jmpq *0x200c24(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
4003ec: 0f 1f 40 00 nopl 0x0(%rax)
00000000004003f0 <printf@plt>:
4003f0: ff 25 22 0c 20 00 jmpq *0x200c22(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
4003f6: 68 00 00 00 00 pushq $0x0
4003fb: e9 e0 ff ff ff jmpq 4003e0 <_init+0x18>
.
.
.
00000000004004f6 <main>:
#include <stdio.h>
int main() {
4004f6: 55 push %rbp
4004f7: 48 89 e5 mov %rsp,%rbp
printf("Hello World!\n");
4004fa: bf a0 05 40 00 mov $0x4005a0,%edi
4004ff: b8 00 00 00 00 mov $0x0,%eax
400504: e8 e7 fe ff ff callq 4003f0 <printf@plt>
400509: b8 00 00 00 00 mov $0x0,%eax
}
40050e: 5d pop %rbp
40050f: c3 retq
On the main section, we find the start of our C code, although a lot was already done by the compiler previously when analyzing the assembly code. On the left , we can see the location on the heap memory where the program is running, and the following bytes (each represented as pairs of hexadecimal digits) indicating the instruction its and arguments. On the right side, 'objdump' conveniently disassembled each series of bytes. The left side could also be obtained by using the '-s' argument with 'object' dump.
$ objdump -s hello =================== Contents of section .rodata: 400590 01000200 00000000 00000000 00000000 ................ 4005a0 48656c6c 6f20576f 726c6421 0a00 Hello World!..
With this output, we can see the contents of the memory used throughout the whole program. Here we can see the "Hello World!" string on the 4005a0 address, under the read-only data section. Note that this address was not shown on the previous command, although it is manually moved to the 'edi' register to be used as a destination address. Afterwards, we notice the program probably preparing a 'sys_read' syscall from standard input, when zero is assigned to the 'eax' register.
The program then calls the 'printf' function through PLT( procedure linkage table), which basically helps link the executable with the C libraries on the system.
Now let's take a look on the header section for the file.
$ readelf -h hello =================== ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x400400 Start of program headers: 64 (bytes into file) Start of section headers: 8752 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 35 Section header string table index: 32
There is some useful information here. More importantly the start of section headers number. Let's save it for later.
Performing the 'du -h hello' command, I get that it occupies 11 Kilobytes on disk.
'-static' argument
Now including the 'static' flag on the gcc command, the compiler is creating an standalone application by copying its entire library includes into the executable. This makes for a very portable application that does not depend on libraries available on the system, but also immensely increases the file size. Our simple "Hello World!" program has the whole standard C library in it, but it is using only one function of it. The executable now has almost 900Kilobytes, opposed to the previous 11K.Using built in function optimizations
The compiler utilizes some common optimizations, by default. When we use the '-fno-builtin', we are telling the compiler not to use them. Analyzing the optimized executable, we get basically the same 'main' section. A important difference is the use of 'puts' instead of 'printf', which has formatting capabilities. Since we're only using one argument as a single string, the compiler calls for 'puts', which simply echos it to standard output.No debugging enabled
The compiler has the option to include debugging information into the executable, so it becomes easier to analyze its assembly code. On the object dump from the original 'hello' program, it echoed the C source along with the instructions. Much of the C source is then included into the executable, therefor making it a larger file. Without the debug option, we can observe a decrease in the file size from 11K to 8.4K, as well as a decrease of section headers.Using arguments on 'printf'
Now we're going to modify the program a little, so we can compare the changes done to it in assembly.#include <stdio.h> int main() { printf("Hello World!\n %d %d %d %d %d %d %d %d %d %d", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10); }
When compiled and disassembled, we have the following:
00000000004004f6 <main>: 4004f6: 55 push %rbp 4004f7: 48 89 e5 mov %rsp,%rbp 4004fa: 48 83 ec 08 sub $0x8,%rsp 4004fe: 6a 0a pushq $0xa 400500: 6a 09 pushq $0x9 400502: 6a 08 pushq $0x8 400504: 6a 07 pushq $0x7 400506: 6a 06 pushq $0x6 400508: 41 b9 05 00 00 00 mov $0x5,%r9d 40050e: 41 b8 04 00 00 00 mov $0x4,%r8d 400514: b9 03 00 00 00 mov $0x3,%ecx 400519: ba 02 00 00 00 mov $0x2,%edx 40051e: be 01 00 00 00 mov $0x1,%esi 400523: bf d0 05 40 00 mov $0x4005d0,%edi 400528: b8 00 00 00 00 mov $0x0,%eax 40052d: e8 be fe ff ff callq 4003f0 <printf@plt> 400532: 48 83 c4 30 add $0x30,%rsp 400536: b8 00 00 00 00 mov $0x0,%eax 40053b: c9 leaveq 40053c: c3 retq 40053d: 0f 1f 00 nopl (%rax)
Here we can see that the 'main' section continues pretty much the same logic. What is very noticeable is the assignment of the digits to the registers and stack. Up to five arguments can be assigned to the registers to be used by the 'printf' function, but if more than that is needed, it will be pushed to the stack. The arguments that are stored on stack are assigned in the opposite order than in the program so that 'printf' pops the top value first. In this case, after the values on the registers substitute the '%d' in memory, the top value would be 6.
After 'printf' is executed, there arguments values are still on the stack. 'leaveq' takes care of that by assigning the stack pointer as base pointer, so the next values written on the stack will overwrite the previous ones.
Output function
By moving 'printf' to a separate function and calling that function in 'main', the 'main' section now calls the address of a new section called 'output', and in it 'printf' is called. With the default options, the compiler would optimize the instructions so that it functions like our original C code, but still there would be a new section for 'output'.Full Optimization
For full optimization on compilation, the compiler must receive the '-O3' option, which means to fully optimize it, even if the output executable is not stable. Compiling our original "Hello World!" program we get this:0000000000400400 <main>: 400400: 48 83 ec 08 sub $0x8,%rsp 400404: bf b0 05 40 00 mov $0x4005b0,%edi 400409: 31 c0 xor %eax,%eax 40040b: e8 e0 ff ff ff callq 4003f0 <printf@plt> 400410: 31 c0 xor %eax,%eax 400412: 48 83 c4 08 add $0x8,%rsp 400416: c3 retq 400417: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 40041e: 00 00
Compared to the original code, it seems more efficient, but also harder to read. One simple optimization made is setting 'eax' to zero by using the Exclusive OR operation with itself, instead of moving a zero value to the register.