Java internals, or when true != true
Most programmers have heard jokes about inserting a Greek question mark (;
, U+037E) into Java code in place of a
semicolon to cause "inexplicable" compilation errors.
But, it's too easy to discover. What about something that manifests itself at runtime, but when inspected —
either by printing to stdout
or through a debugger — shows nothing amiss?
Using the internal sun.misc.Unsafe
(and targetting Hotspot VMs),
we can create a boolean that compares equal to neither true
nor false
, but when inspected, will always manifest
itself as true
.
Let's take a look.
import sun.misc.Unsafe;
import java.lang.reflect.Field;
public class Tainted {
public static boolean toTaint = true;
public static void main(String[] argv) throws Exception {
Field _unsafe = Class.forName("sun.misc.Unsafe").getDeclaredField("theUnsafe");
_unsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) _unsafe.get(null);
unsafe.putInt(Tainted.class, unsafe.staticFieldOffset(Tainted.class.getDeclaredField("toTaint")), 2);
test(toTaint, false);
test(toTaint, true);
test(toTaint, toTaint);
}
public static void test(boolean a, boolean b) {
System.out.printf("%s == %s: %s\n", a, b, a == b);
}
}
The output of the above code is shown below.
true == false: false
true == true: false
true == true: true
So, what's going on?
The Unsafe
class allows us to play around with the raw data backing Java objects. Since this is inherently unsafe,
we have to jump around a few hoops: specifically, we must use reflection to grab the Unsafe
instance (this can be blocked
by a security manager, for security-concious applications). The alternative is to set our classes as part of the bootclasspath
and use Unsafe.getUnsafe()
directly, but that's less elegant.
Once we have our Unsafe
instance, we can use it to determine the offset in memory from the base of our class of our toTaint
boolean. Then, we can use putInt
to set the value of toTaint
to the integer 2.
But what does this mean?
If we look into the internals of the JVM, we can find the declaration of jboolean
(the internal representation of a boolean
object)
in jni.h
as an unsigned char
.
...
typedef unsigned char jboolean;
typedef unsigned short jchar;
typedef short jshort;
typedef float jfloat;
typedef double jdouble;
...
This makes sense: there's no data type for storing just one bit of data, and an unsigned char
is guaranteed to be at least 8 bits.
That is, a boolean can actually store any number in the range 0 to 255, and we're setting it to the integer value 2.
Internally, when the JVM does equality comparisons, it doesn't only check one specific bit of both boolean values (that'd be a silly waste of time);
instead, it simply compares all 8 bits. A real true
value has only the least significant bit set (i.e., is equal to the integer 1). So, a real
true
will not compare equal to our tainted boolean (set to 2), nor will it to a real false
(stored as 0).
However, this boolean is functionally equivalent otherwise: conditional branching operations look to see only if the value is nonzero, so an if (toTaint)
block of code would still execute as expected.
With that in mind, we can take a look at the code of the Boolean
class to explain the final bit of the puzzle:
...
public static String toString() {
return value ? "true" : "false";
}
...
When we're printing out our boolean, internally toString
must be called on our object, so the boolean
is autoboxed to a Boolean
, and
the above code is called. As we've discussed already, branch operations treat any nonzero value as true
, so our boolean will always be
represented by the string true
.
And that wraps up our goal! The Unsafe
class has many practical uses for legitimate applications, but sometimes trying out
illegitimate things is the best way to learn something new — which hopefully this post has helped with!