Tuesday, January 31, 2017

Lab 4 - Compiled C Lab

This assignment deals with the experimentation of some of the many argument options available when compiling C code with GCC. A simple "Hello World" program will be used to test the different options. In order to analyze the binary executable, the following arguments will be used:

    -g              #enable debugging information
    -O0             #(Letter 'o' followed by zero) do not optimize
    -fno-builtin    #do not use builtin function optimization


The source code will be compiled under a Linux x86_64 environment and a ELF (Executable and Linkable Format) file is the resulting binary.

For analyzing the executable in a readable way, we can use the 'objdump' and 'readelf' commands. With 'objdump' we can use an argument for showing the header information for the file (-f), display summary information of the file divided by section (-s) or disassemble the executable (-d). Having the debug flag when compiling also helps when using the option to show the C source code (--source), along with the instructions in assembly (-d implied).

$ objdump --source hello
========================


00000000004003e0 <printf@plt-0x10>:
  4003e0:       ff 35 22 0c 20 00       pushq  0x200c22(%rip)        # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
  4003e6:       ff 25 24 0c 20 00       jmpq   *0x200c24(%rip)        # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
  4003ec:       0f 1f 40 00             nopl   0x0(%rax)

00000000004003f0 <printf@plt>:
  4003f0:       ff 25 22 0c 20 00       jmpq   *0x200c22(%rip)        # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
  4003f6:       68 00 00 00 00          pushq  $0x0
  4003fb:       e9 e0 ff ff ff          jmpq   4003e0 <_init+0x18>
.
.
.
00000000004004f6 <main>:
#include <stdio.h>

int main() {
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
            printf("Hello World!\n");
  4004fa:       bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       b8 00 00 00 00          mov    $0x0,%eax
  400504:       e8 e7 fe ff ff          callq  4003f0 <printf@plt>
  400509:       b8 00 00 00 00          mov    $0x0,%eax
}
  40050e:       5d                      pop    %rbp
  40050f:       c3                      retq



On the main section, we find the start of our C code, although a lot was already done by the compiler previously when analyzing the assembly code. On the left , we can see the location on the heap memory where the program is running, and the following bytes (each represented as pairs of hexadecimal digits) indicating the instruction its and arguments. On the right side, 'objdump' conveniently disassembled each series of bytes. The left side could also be obtained by using the '-s' argument with 'object' dump.

$ objdump -s hello
===================

Contents of section .rodata:
 400590 01000200 00000000 00000000 00000000  ................
 4005a0 48656c6c 6f20576f 726c6421 0a00      Hello World!..

With this output, we can see the contents of the memory used throughout the whole program. Here we can see the "Hello World!" string on the 4005a0 address, under the read-only data section. Note that this address was not shown on the previous command, although it is manually moved to the 'edi' register to be used as a destination address. Afterwards, we notice the program probably preparing a 'sys_read' syscall from standard input, when zero is assigned to the 'eax' register.

The program then calls the 'printf' function through PLT( procedure linkage table), which basically helps link the executable with the C libraries on the system.

Now let's take a look on the header section for the file.

$ readelf -h hello
===================

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400400
  Start of program headers:          64 (bytes into file)
  Start of section headers:          8752 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         35
  Section header string table index: 32


There is some useful information here. More importantly the start of section headers number. Let's save it for later.

Performing the 'du -h hello' command, I get that it occupies 11 Kilobytes on disk.

'-static' argument

Now including the 'static' flag on the gcc command, the compiler is creating an standalone application by copying its entire library includes into the executable. This makes for a very portable application that does not depend on libraries available on the system, but also immensely increases the file size. Our simple "Hello World!" program has the whole standard C library in it, but it is using only one function of it. The executable now has almost 900Kilobytes, opposed to the previous 11K.


Using built in function optimizations

The compiler utilizes some common optimizations, by default. When we use the '-fno-builtin', we are telling the compiler not to use them. Analyzing the optimized executable, we get basically the same 'main' section. A important difference is the use of 'puts' instead of 'printf', which has formatting capabilities. Since we're only using one argument as a single string, the compiler calls for 'puts', which simply echos it to standard output.


No debugging enabled

The compiler has the option to include debugging information into the executable, so it becomes easier to analyze its assembly code. On the object dump from the original 'hello' program, it echoed the C source along with the instructions. Much of the C source is then included into the executable, therefor making it a larger file. Without the debug option, we can observe a decrease in the file size from 11K to 8.4K, as well as a decrease of section headers.

Using arguments on 'printf'

Now we're going to modify the program a little, so we can compare the changes done to it in assembly.

#include <stdio.h>

int main() {
            printf("Hello World!\n %d %d %d %d %d %d %d %d %d %d",
                                   1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}

When compiled and disassembled, we have the following:

00000000004004f6 <main>:
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
  4004fa:       48 83 ec 08             sub    $0x8,%rsp
  4004fe:       6a 0a                   pushq  $0xa
  400500:       6a 09                   pushq  $0x9
  400502:       6a 08                   pushq  $0x8
  400504:       6a 07                   pushq  $0x7
  400506:       6a 06                   pushq  $0x6
  400508:       41 b9 05 00 00 00       mov    $0x5,%r9d
  40050e:       41 b8 04 00 00 00       mov    $0x4,%r8d
  400514:       b9 03 00 00 00          mov    $0x3,%ecx
  400519:       ba 02 00 00 00          mov    $0x2,%edx
  40051e:       be 01 00 00 00          mov    $0x1,%esi
  400523:       bf d0 05 40 00          mov    $0x4005d0,%edi
  400528:       b8 00 00 00 00          mov    $0x0,%eax
  40052d:       e8 be fe ff ff          callq  4003f0 <printf@plt>
  400532:       48 83 c4 30             add    $0x30,%rsp
  400536:       b8 00 00 00 00          mov    $0x0,%eax
  40053b:       c9                      leaveq
  40053c:       c3                      retq
  40053d:       0f 1f 00                nopl   (%rax)

Here we can see that the 'main' section continues pretty much the same logic. What is very noticeable is the assignment of the digits to the registers and stack. Up to five arguments can be assigned to the registers to be used by the 'printf' function, but if more than that is needed, it will be pushed to the stack. The arguments that are stored on stack are assigned in the opposite order than in the program so that 'printf' pops the top value first. In this case, after the values on the registers substitute the '%d' in memory, the top value would be 6.

After 'printf' is executed, there arguments values are still on the stack. 'leaveq' takes care of that by assigning the stack pointer as base pointer, so the next values written on the stack will overwrite the previous ones.

Output function

By moving 'printf' to a separate function and calling that function in 'main', the 'main' section now calls the address of a new section called 'output', and in it 'printf' is called. With the default options, the compiler would optimize the instructions so that it functions like our original C code, but still there would be a new section for 'output'.

Full Optimization

For full optimization on compilation, the compiler must receive the '-O3' option, which means to fully optimize it, even if the output executable is not stable. Compiling our original "Hello World!" program we get this:
0000000000400400 <main>:
  400400:       48 83 ec 08             sub    $0x8,%rsp
  400404:       bf b0 05 40 00          mov    $0x4005b0,%edi
  400409:       31 c0                   xor    %eax,%eax
  40040b:       e8 e0 ff ff ff          callq  4003f0 <printf@plt>
  400410:       31 c0                   xor    %eax,%eax
  400412:       48 83 c4 08             add    $0x8,%rsp
  400416:       c3                      retq
  400417:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40041e:       00 00

Compared to the original code, it seems more efficient, but also harder to read. One simple optimization made is setting 'eax' to zero by using the Exclusive OR operation with itself, instead of moving a zero value to the register.

Conclusion

The compiler offers many debugging and optimization options, although for today's programs those optimizations are pretty much done automatically. As programs get larger and more complex, there's is only so much the compiler can optimize. There is a need for optimization for programmers, as they have the context necessary to determine the common processing of a program that the compiler couldn't.

No comments:

Post a Comment