Tuesday, April 4, 2017

qsort - The useful but hard to master glibc function

 Understanding qsort


For my project I decided to give a try in optimizing the qsort function in glibc. The function sorts an array of nmemb elements, each of size size. The interesting feature of the function is that it can be used to sort any type of array. One of the function arguments is a pointer to a comparison function, which the programmer creates. This allows sorting integers, strings, or even custom structs. Here's the prototype:

 #include <stdlib.h>  
   
     void qsort(void *base, size_t nmemb, size_t size,  
          int (*compar)(const void *, const void *));  
   
     void qsort_r(void *base, size_t nmemb, size_t size,  
          int (*compar)(const void *, const void *, void *),  
          void *arg);  


The base argument receives an array, and the comparison function compar returns an integer and must receive two void pointers to the elements to be compared. The comparison function must return a  negative number if the elements are out of order or a positive number if the elements are in order. If the integer returned is zero, the ordering of the two elements in the array is unpredictable. 

 int  
 compare_doubles (const void *a, const void *b)  
 {  
  const double *da = (const double *) a;  
  const double *db = (const double *) b;  
  return (*da > *db) - (*da < *db);  
 }  

Another example would be using an existing function, like strcmp. Although this function doesn't receive void pointer arguments, it can be wrapped inside another function. This is an example from the qsort man page:


 #include <string.h>  
   
     static int  
     cmpstringp(const void *p1, const void *p2)  
     {  
       /* The actual arguments to this function are "pointers to  
        pointers to char", but strcmp(3) arguments are "pointers  
        to char", hence the following cast plus dereference */  
   
       return strcmp(* (char * const *) p1, * (char * const *) p2);  
     }  

For a given array of doubles called arr, of size NUM_ELEM, the qsort function would be called like so:

   qsort( arr, NUM_ELEM, sizeof(double), compare_doubles);  

I haven't yet mentioned the qsort_r alternative. This is an implementation derived from BSD to make the qsort reentrant safe. The additional argument serves to pass a mutable state to the function (e.g global variable) and implies that your comparison function must also be reentrant safe.


Under the hood


As most things in life, the deeper you go, the uglier it gets.  To understand how qsort is being called, we need to dig into the library. The functions qsort  and qsort_r are defined in a file called msort.c (why?) in the standard library.

The function qsort simply calls qsort_r with a last argument of NULL. qsort_r starts by checking sizes, determining if a temporary array is being allocated in the stack or heap, if there is enough memory to allocate on the heap, and other checks to determine how and where the given array will be processed. For example, if memory requirements are too high, sorting will be done on the array itself instead of creating a temporary copy.

From what I understood, there are two possible functions that can be called to sort the given array, depending on the size of the array and the available memory on the computer. The two functions are __quicksort and msort_with_tmp. Neither of those are documented at all and only the first has comments on the source. The comments on qsort_r imply that __quicksort is slower and is only used if a temporary array couldn't be allocated in memory, as it "doesn't need a temporary array". For any other case, msort_with_temp is used to indirectly sort the array.

Optimiz(ing|able\?)


After going through all that horrible looking code, I started to extract the pieces I thought necessary to do some benchmarking and profiling. There are so many moving pieces that it became really hard to just set up a development environment to work on. With no success, my only other option would be to work on the code itself, but that would have two major disadvantages: I wouldn't be able to do profiling on the different sections of the code; Every change would need to undergo a complete glibc compilation. So aside from getting vague performance results, each change to the code would take 10 minutes or more to be tested.

Now, even assuming I had none of the previous problems, and that I could have set up a development environment, there is the question of the possibility for optimization. Sorting is a deeply studied subject in computer science, and even then it isn't as complicated as actually implementing it to support any data type or structure, determining what sorting algorithm is best for a given situation at runtime, being reentrant safe and standards compliant, and making it perform as fast as possible, all while being portable between various architectures. It is a lot to take on, and some have. From the comments on the source, we can see that a lot of the code was referenced from books and other sources in which the authors extensively studied the subject, so the code might already be as optimized as possible.

So why did I choose qsort? My original idea was to profile the "function" and determine at which step it was taking the most time on. Then focus on that part and maybe implement it in A64 assembly using vectorization. Unfortunately, I couldn't get to really test it, so I'm moving on to another function.

It also didn't help that this code was heavily "patched" over the years. There is so much going on just to determine what sorting algorithm to call. It feels like this set of functions fail to follow part of the Unix philosophy of "doing one thing" and just focus on the "doing it well" part. I say it was "patched" because we can see how all this originated. Also, here is a more candour informal perspective on the matter.


No comments:

Post a Comment