Peeking under the hood of GCC's __builtin_expect
If you've ever poked at high-performance C code, you've probably seen GCC's
__builtin_expect
extension being used to manually hint the likelihood of a
branch being taken a particular way.
The Linux kernel famously contains macros for likely
and unlikely
branches,
which perform the appropriate __builtin_expect
incantations.
#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr) __builtin_expect(!!(expr), 1)
…but, how does this all work? What does "hinting" mean, exactly, and how does
__builtin_expect
translate to generated assembly?
Let's write a short exploratory program to find out.
int main(int argc, char **argv) {
volatile int x;
if (__builtin_expect(argc % 2, EXPECT)) {
x = 1;
} else {
x = 0;
}
return x;
}
The program returns 1 if the number of parameters it is passed is even, and 0
otherwise (remember that the executable name is, by convention, always
passed in argv[0]
). We use a compile-time define, EXPECT
, as a parameter to
__builtin_expect
. Our return value, x
, is marked as volatile to prevent the
compiler from optimizing it out.
We can compile two versions of this binary: one with EXPECT = 1
and one
with EXPECT = 0
, and see how they differ. Recall that for EXPECT = 1
, we
are telling the compiler that we expect the x = 1
branch to be more likely,
and vice-versa.
$ gcc -g -O2 -DEXPECT=1 expect.c -o expect1
$ gcc -g -O2 -DEXPECT=0 expect.c -o expect0
There are many ways to view the generated assembly of a binary, but for this
post I'll be using an invocation of gdb
.
$ gdb -batch -ex "disassemble/m main" ./expect1
(If you are following along online, you can check out this code on gcc.godbolt.org, and play with the value of EXPECT
in the top-right box.)
Without further ado, below is the disassembly of expect1
:
Dump of assembler code for function main:
1 int main(int argc, char **argv) {
2 volatile int x;
3
4 if (__builtin_expect(argc % 2, EXPECT)) {
0x0000000000001040 <+0>: and edi,0x1
0x0000000000001043 <+3>: je 0x1052 <main+18>
5 x = 1;
0x0000000000001045 <+5>: mov DWORD PTR [rsp-0x4],0x1
6 } else {
7 x = 0;
0x0000000000001052 <+18>: mov DWORD PTR [rsp-0x4],0x0
0x000000000000105a <+26>: jmp 0x104d <main+13>
8 }
9
10 return x;
0x000000000000104d <+13>: mov eax,DWORD PTR [rsp-0x4]
0x0000000000001051 <+17>: ret
End of assembler dump.
A short summary of what's happening here:
edi
stores the value ofargc
(System V x86-64 ABI; recalledi
is the lower 32 bits ofrdi
).- Since division is expensive, GCC has replaced our
% 2
with& 1
. - If the lowest bit in
argc
is 0,je
will jump to amov
forx = 0
. - Otherwise,
x = 1
will be executed, beforejmp
-ing to the end ofmain
andret
-urning.
Still, there's nothing that immediately stands out for where the branch
predictor hinting is happening. There's no magical hint
instruction, at any
rate.
What if we were to diff
the disassembly of expect0
and expect1
instead?
$ gdb -batch -ex "disassemble/m main" ./expect1 > expect_1
$ gdb -batch -ex "disassemble/m main" ./expect0 > expect_0
$ git diff expect_0 expect_1
Now we're getting somewhere!
diff --git a/expect_0 b/expect_1
index a80a1bd..17f3458 100644
--- a/expect_0
+++ b/expect_1
@@ -6,15 +6,15 @@ Dump of assembler code for function main:
3
4 if (__builtin_expect(argc & 1, EXPECT)) {
0x0000000000001040 <+0>: and edi,0x1
- 0x0000000000001043 <+3>: jne 0x1052 <main+18>
+ 0x0000000000001043 <+3>: je 0x1052 <main+18>
5 x = 1;
- 0x0000000000001052 <+18>: mov DWORD PTR [rsp-0x4],0x1
- 0x000000000000105a <+26>: jmp 0x104d <main+13>
+ 0x0000000000001045 <+5>: mov DWORD PTR [rsp-0x4],0x1
6 } else {
7 x = 0;
- 0x0000000000001045 <+5>: mov DWORD PTR [rsp-0x4],0x0
+ 0x0000000000001052 <+18>: mov DWORD PTR [rsp-0x4],0x0
+ 0x000000000000105a <+26>: jmp 0x104d <main+13>
8 }
9
Clearly, the branch order is reversed, and jne
is used in place of je
.
In other words, the "preferred path" for the branch predictor, when it has no
historical branch data to base a prediction off of, is the fall-through path
(i.e. the else
branch). __builtin_expect
then simply reorders code such that
the else
branch contains the programmer-specified most-likely path, and
negates the if
operand as necessary.
To me at least, this behaviour does seem pretty magical, and I was surprised in not being able to readily find this mentioned online in the conventional sources of programmer wisdom (i.e. StackOverflow).
If one digs deep enough in the Intel 64 and IA-32 Architectures Optimization Reference Manual, one can find a reference for this behaviour on page 105 (emphasis mine):
3.4.1.6 Branch Type Selection
The default predicted target for indirect branches and calls is the fall-through path. Fall-through prediction is overridden if and when a hardware prediction is available for that branch. The predicted branch target from branch prediction hardware for an indirect branch is the previously executed branch target.
The more you know.