Peeking under the hood of GCC's `__builtin_expect`

If you've ever poked at high-performance C code, you've probably seen GCC's __builtin_expect extension being used to manually hint the likelihood of a branch being taken a particular way.

The Linux kernel famously contains macros for likely and unlikely branches, which perform the appropriate __builtin_expect incantations.

#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr)   __builtin_expect(!!(expr), 1)

…but, how does this all work? What does "hinting" mean, exactly, and how does __builtin_expect translate to generated assembly?

Let's write a short exploratory program to find out.

int main(int argc, char **argv) {
  volatile int x;

  if (__builtin_expect(argc % 2, EXPECT)) {
    x = 1;
  } else {
    x = 0;
  }

  return x;
}

The program returns 1 if the number of parameters it is passed is even, and 0 otherwise (remember that the executable name is, by convention, always passed in argv[0]). We use a compile-time define, EXPECT, as a parameter to __builtin_expect. Our return value, x, is marked as volatile to prevent the compiler from optimizing it out.

We can compile two versions of this binary: one with EXPECT = 1 and one with EXPECT = 0, and see how they differ. Recall that for EXPECT = 1, we are telling the compiler that we expect the x = 1 branch to be more likely, and vice-versa.

$ gcc -g -O2 -DEXPECT=1 expect.c -o expect1
$ gcc -g -O2 -DEXPECT=0 expect.c -o expect0

There are many ways to view the generated assembly of a binary, but for this post I'll be using an invocation of gdb.

$ gdb -batch -ex "disassemble/m main" ./expect1

(If you are following along online, you can check out this code on gcc.godbolt.org, and play with the value of EXPECT in the top-right box.)

Without further ado, below is the disassembly of expect1:

Dump of assembler code for function main:
1	int main(int argc, char **argv) {

2	  volatile int x;

3	
4	  if (__builtin_expect(argc % 2, EXPECT)) {
   0x0000000000001040 <+0>:	and    edi,0x1
   0x0000000000001043 <+3>:	je     0x1052 <main+18>

5	    x = 1;
   0x0000000000001045 <+5>:	mov    DWORD PTR [rsp-0x4],0x1

6	  } else {
7	    x = 0;
   0x0000000000001052 <+18>:	mov    DWORD PTR [rsp-0x4],0x0
   0x000000000000105a <+26>:	jmp    0x104d <main+13>

8	  }
9	
10	  return x;
   0x000000000000104d <+13>:	mov    eax,DWORD PTR [rsp-0x4]
   0x0000000000001051 <+17>:	ret

End of assembler dump.

A short summary of what's happening here:

edi stores the value of argc (System V x86-64 ABI; recall edi is the lower 32 bits of rdi).
Since division is expensive, GCC has replaced our % 2 with & 1.
If the lowest bit in argc is 0, je will jump to a mov for x = 0.
Otherwise, x = 1 will be executed, before jmp-ing to the end of main and ret-urning.

Still, there's nothing that immediately stands out for where the branch predictor hinting is happening. There's no magical hint instruction, at any rate.

What if we were to diff the disassembly of expect0 and expect1 instead?

$ gdb -batch -ex "disassemble/m main" ./expect1 > expect_1
$ gdb -batch -ex "disassemble/m main" ./expect0 > expect_0
$ git diff expect_0 expect_1

Now we're getting somewhere!

diff --git a/expect_0 b/expect_1
index a80a1bd..17f3458 100644
--- a/expect_0
+++ b/expect_1
@@ -6,15 +6,15 @@ Dump of assembler code for function main:
 3
 4        if (__builtin_expect(argc & 1, EXPECT)) {
    0x0000000000001040 <+0>:    and    edi,0x1
-   0x0000000000001043 <+3>:    jne    0x1052 <main+18>
+   0x0000000000001043 <+3>:    je     0x1052 <main+18>

 5          x = 1;
-   0x0000000000001052 <+18>:   mov    DWORD PTR [rsp-0x4],0x1
-   0x000000000000105a <+26>:   jmp    0x104d <main+13>
+   0x0000000000001045 <+5>:    mov    DWORD PTR [rsp-0x4],0x1

 6        } else {
 7          x = 0;
-   0x0000000000001045 <+5>:    mov    DWORD PTR [rsp-0x4],0x0
+   0x0000000000001052 <+18>:   mov    DWORD PTR [rsp-0x4],0x0
+   0x000000000000105a <+26>:   jmp    0x104d <main+13>

 8        }
 9

Clearly, the branch order is reversed, and jne is used in place of je.

In other words, the "preferred path" for the branch predictor, when it has no historical branch data to base a prediction off of, is the fall-through path (i.e. the else branch). __builtin_expect then simply reorders code such that the else branch contains the programmer-specified most-likely path, and negates the if operand as necessary.

To me at least, this behaviour does seem pretty magical, and I was surprised in not being able to readily find this mentioned online in the conventional sources of programmer wisdom (i.e. StackOverflow).

If one digs deep enough in the Intel 64 and IA-32 Architectures Optimization Reference Manual, one can find a reference for this behaviour on page 105 (emphasis mine):

3.4.1.6 Branch Type Selection

The default predicted target for indirect branches and calls is the fall-through path. Fall-through prediction is overridden if and when a hardware prediction is available for that branch. The predicted branch target from branch prediction hardware for an indirect branch is the previously executed branch target.

The more you know.