Sunday, January 29, 2017

Lab 3 - Assembler Lab

This assignment requires the use of the Assembly language to display and increment a value, using a loop. The code should exist for both, the x86_64 and Aarch64 architectures. A sample initial loop was provided in GNU Assembler (GAS) and when compiled and executed, displays the following output:

Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop

The next objective is to modify the code so that this output is displayed:
Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
Loop: 6
Loop: 7
Loop: 8
Loop: 9

Our group's solution is the following:

The first step was to create the address locations containing the data we want to display (Lines 45-54). For our solution, we created three values we want to display. The 'msg' value contains the string 'Loop: ' and its length is specified by " . (dot) - (minus) msg", which means, this current address minus the length of msg. The 'digit' value will be used to display the increasing digit. It is declared as an empty space, but will be changed in the code for a single digit. The 'nl' value is simply to add a newline character after each iteration. Each of these values correspond to an address in memory that will later be accessed.

Afterwards, we can analyze  the section that will print the 'msg' value, but before we get into the code, we need to understand 'syscall' a little better. Syscall is what is used to perform system calls while executing (e.g writing, reading, exiting, etc). It uses the registers rdi, rsi, rdx, rcx, r8d, r9d, in this specific order, when executing a system call specified by rax. In this assignment, only the 'sys_write' system call will be used and it has a code '1'. The rdi register is the destination index and it should point to the standard output on the console, therefor it should hold a value of 1. The rsi contains the data to be used as source when writing, so it should point to one of the message values we specified earlier. The last argument 'sys_write' needs is the length of the message, which was declared as len, len1 and len2, and can be stored on rdx. The final system call should be something similar to this:

                sys_write( 1 ,"Loop: ", len)

Of course, the assembly language isn't so simple as to just throw some arguments into a function. Each argument has to be placed in its right registry. Analyzing the lines 14 to 19, we start by allocating the length of msg to rdx. Then we assign 'msg' to rsi, so it can be used as source data. Next, we change rdi to 1, so we write to stdout on the console, and change rax to 1, so we call 'sys_write' on the pre-assembled (!!) registers. After we have all registers assigned, 'syscall' on line 19 will perform the system call and write "Loop: " once to the console.

The next block of code (21-28) has the logic for the single digit. The writing process is the same as the previous block of code. What is interesting about this block is how the digit to be written changes on each iteration of the loop. The first change we notice is on line 24. As we know from the sample loop, there is a loop counter on register r15. It starts at zero and is incremented by one at the end of each iteration. To display the values from 0 to 9, we simply need to assign each current value of the loop counter to rsi and convert it to ASCII. The 'movb' instruction moves the lower byte of r15 to rsi and, on the next line, we convert it to ASCII by adding 48 to it. We then write to the console.

So this concludes the first part of the assignment. The next is to display the numbers from 0 to 30, suppressing the leading zero. Here's my solution:

Now there are two digits to be displayed and if the leading digit is zero, it should be omitted. In order to know when 10 is reached, we'll divide the counter by ten. The quotient is used as the leading digit and the remainder is used as the following digit. The 'div' instruction stores the quotient in the rax register and the remainder on the rdx register, but for each division, rdx must be zeroed out as shown on line 21. After dividing the counter by ten, we than store the quotient on r13 and the remainder on r14, as those registers preserve their values.

As 'msg' is already printed, we now need to print the leading character, if not zero, so we use the 'cmp' instruction to compare the quotient with zero (32). As flags are set with the last instruction, we use the 'je' instruction (jump if equal) to jump to 'lsd' (least significant digit) if the leading digit is equal to zero, therefore not going through the process of writing it.

After changing 'max' on line 5, to 31, we get the following output:
Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
Loop: 6
Loop: 7
Loop: 8
Loop: 9
Loop: 10
Loop: 11
Loop: 12
Loop: 13
Loop: 14
Loop: 15
Loop: 16
Loop: 17
Loop: 18
Loop: 19
Loop: 20
Loop: 21
Loop: 22
Loop: 23
Loop: 24
Loop: 25
Loop: 26
Loop: 27
Loop: 28
Loop: 29
Loop: 30

Now that we have the x86_64 Assembly code figured out, let's analyze the equivalent code for Aarch64:

Right on line 8 we can see the first difference. The instructions are quite different for this architecture. We can see that the register r28, the first argument, receives 'start'. One instruction quite different is the division instruction. On x86_64, it provides the quotient and the remainder on specific registers. On Aarch64, the udiv requires three arguments: a register to store the quotient, a register with the dividend and a register with the divisor. The remainder isn't provided. Instead, to get the remainder there is the need to use a separate instruction called 'msub':
            msub r0,r3,r2,r1    ->    r0 = r1 - ( r2 * r3 )

And the remainder can be calculated:
            remainder = dividend - ( quotient * divisor )

Another difference is when sending the data to be written to the 'digit' address. On the x86_64, 'movb' was used to send a single byte, but on Aarch64, 'str' was used to store a value on a place in memory, and its leading bits are zeroed out.


Debugging


While Aarch64 seemed to be more efficient and have various seemingly useful instructions, I found that x86_64 is more intuitive, but requires more manual handling. Plus I found that it had much more online resources available.

It is very hard to be efficient in assembly debugging in general without the use of a tool that allows for stepping and viewing of values during run time (e.g gdb). I had to rely on methodology a lot, and doing the steps on paper. A tool for debugging would be absolutely essential for larger projects.

Conclusion

Assembly code is the closest to the metal as it gets, unless you're willing to program in hex. It is extremely efficient for its direct communication with the CPU, but also a simple program can take pages of code and be very hard to debug. It can be very powerful and useful on certain places, but mostly embedded into low level programming. It also requires a lot of repetitive coding, as it is not portable to other architectures and there might even be different solutions that function better on each of them. Aarch64 can still get a lot of optimization for source code that was built specifically for x86_64.


No comments:

Post a Comment