Pages - Menu

Friday, August 31, 2012

Java Oddities (Part II)

My previous post (Java Oddities Part I) created a lot of discussion on reddit. People suggested many interesting cases and I would like to describe some of them with additional details.

Thanks to Ben Evans for contributing some further information.

Dangerous Method Overloading

// credit to choychoy
List<Integer> list = new ArrayList(Arrays.asList(1,2,3));
int v = 1;
list.remove(v);
System.out.println(list); // prints [1, 3]

List<Integer> list = new ArrayList(Arrays.asList(1,2,3));
Integer v = 1;
list.remove(v);
System.out.println(list); // prints [2, 3]

The java.util.List interface describes two methods named remove.

The first one is remove(int). It removes an element from the list based on its index, which is represented by a value of type int (note: an index starts at 0). The second one is remove(Object). It removes the first occurrence of the object passed as argument.

This is referred to as method overloading: the same method name is used for describing two different operations. The choice of the operation is based on the types of the method parameters. In academic terminology we will say that it is an example of ad-hoc polymorphism.

So what happens in the piece of code above? The first case is straightforward as we pass a variable of type int and there's a signature for remove which expects exactly an int. This is why the element at index 1 is removed.

In the second case, we pass an argument of type Integer. Since there is no signature for remove that directly takes an Integer parameter, Java tries to find the closest matching signature. The Java Language Specification (Determine Method Signature) states that resolution based on subtyping comes before allowing boxing/unboxing rules. Since java.lang.Integer is a subtype of java.lang.Object, the method remove(Object) is invoked. This is why, the call remove(v) finds the first Integer containing the value 1 and removes it from the list.

Note that this problem wouldn't exist if the java.util.List interface differentiated the two remove operations with two different method names: removeAtIndex(int) and removeElement(Object). For those interested in getting more views about method overloading, there is a famous paper from Bertrand Meyer on the topic.

Array Initializer Syntax Curiosity

Java just like C and C# allows a trailing comma after the last expression in an array initializer. This is documented in the Java Language Specification (Array Initializer).

However, what if the initializer contains no expression? This is where Java differs from other languages like C and C#:

// Java
int a[] = {}; // valid
int b[] = {,}; // also valid, an array of length 0 >:o
// C
int a[] = {,}; // error: expected expression before ‘,’ token
// C#
int a[] = {,}; // Unexpected symbol ','

The Type of a Conditional Expression

// credit to fragglet
Object o = true ? 'r' : new Double(1);
System.out.println(o); // 114.0
System.out.println(o.getClass()); // class java.lang.Double

This looks a bit odd. The conditional expression is true, so you might expect that the char 'r' would be boxed into java.lang.Char.

How did we end up with java.lang.Double as the runtime type of o? The value 114.0 looks suspicious as well - but we might guess that it's the ASCII value which corresponds to the character 'r'. But why is it ending up in a numeric type?

Let's take a step back, and examine the general question - which is: what should the type of the conditional expression be if the type of the second and third operand are different?

Java has a set of rules to determine this as explained in the Java Language Specification (Conditional Expression).

In this case, the rules say that first of all the third operand is unboxed to the primitive type double. This is specified by the binary numeric promotion rules. After that, a more familiar rule kicks in - the promotion rule for doubles.

This says that if either operand is of type double, the other is converted to double as well. This is why the second operand of type char is widened to a double.

The second and third operand have now the same type and this is the resulting type of the conditional expression - so the expression's type is the primitive type double (and it's value is now the primitive value 114.0). Finally, since we are assigning the result of the conditional expression to a variable of type Object, Java performs assignment conversion. The primitive type double is boxed to the reference type Double (java.lang.Double).

Note that such a mechanism wouldn't be needed for conditional expressions if Java restricted the second and third operands to be strictly of the same type. An alternative option could be union types.

Thursday, August 30, 2012

Java Oddities (Part I)

There's a famous lightening talk given by Gary Bernhardt about Javascript and Ruby oddities.
I would like to start a series of blog posts documenting some oddities in the Java language for fun! I'll explain why or where these oddities come from with reference to the Java Language Specification when possible. I hope you learn some new things. Feel free to email or tweet me if you would like to add to the list.

Array Declarations

Java programmers can declare array variables in several ways:

int[] a;
int b[]; // allowed to make C/C++ people happy

However, the grammar doesn't enforce a particular style for arrays of dimensions greater than one. The [] may appear as part of the type, or as part of the declarator for a particular variable, or both. The following declarations are therefore valid:

int[][] c;
int d[][];
int[] e[]; // :(
int[][] f[]; // :(

This mixed annotations is obviously not recommended by the Java Language Specification (Array Variables) as it can lead to confusions and is reported by code convention tools such as checkstyle.
This can be taken to the extreme. The following method signature in a class or interface declaration will be accepted by the standard Javac parser:

public abstract int[] foo(int[] arg)[][][][][][][][][][][];

The return type of the method foo is int[][][][][][][][][][][][].

In fact, the grammar of ClassBodyDeclaration is defined as follows:

ClassBodyDeclaration = 
   .. | TypeParameters (Type | VOID) Ident MethodDeclaratorRest | ..

MethodDeclaratorRest = 
    FormalParameters BracketsOpt [Throws TypeList] ( MethodBody | [DEFAULT AnnotationValue] ";")

BracketsOpt = {"[" "]"}

The BracketsOpt rule allows a sequence of [] to be inserted after the formal parameters definition.
The relevant lines within com.sun.tools.javac.parser.JavacParser start at 2938.

Array Covariance

Java arrays are covariant. This means that given a type S which is a subtype of a type T then S[] is considered a subtype of T[]. This property is described in the Java Language Specification (Subtyping among Array Types). This property is known to lead to ArrayStore exceptions as documented in the Java Language Specification: (Array Store Exception). For example:

Object[] o = new String[4];
o[0] = new Object(); // compiles but a runtime exception will be reported

Arrays were made covariant because before the introduction of generics it allowed library designers to write generic code (without type safety). For example, one could write a method findItems as follows:

public boolean findItems(Object[] array, Object item)
{
    ...
}

This method will accept arguments such as (String[], String) or (Integer[], Integer) and in a sense reduces code duplication since you don't need to write several methods specific to the types of the arguments. However, there is no contract between the element type of the array that is passed and the type of the item that needs to be found.

Nowadays one can use generic methods (making use of a type parameter) to achieve the same mechanism with additional type safety:

public <T> boolean findItems(T[] array, T item)
{
    ...
}

Integer Caching

int a = 1000, b = 1000;  
System.out.println(a == b); // true
Integer c = 1000, d = 1000;  
System.out.println(c == d); // false
Integer e = 100, f = 100;  
System.out.println(e == f); // true

This behaviour is documented in the Java Language Specification (Boxing Conversion):

If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

For those curious, you can look up the implementation of Integer.valueOf(int), which confirms the specification:

public static Integer valueOf(int i) {
    assert IntegerCache.high >= 127;
    if (i >= IntegerCache.low && i <= IntegerCache.high)
        return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
}