Pages - Menu

Friday, September 21, 2012

Start Hacking With Java 8

This article aims to give some quick background on Java 8 and help you get set up so you can try out the new features. Java veterans (and PL veterans) won't learn anything new, this article is for general Java programmers who want to learn more about what's coming next year.

The list of planned features for Java 8 is available on the OpenJDK page.

I intend to follow up with articles describing each new feature in further details so stay tuned. In the mean time, you can read more about lambdas, virtual extension methods and type annotations.

Background

We have shifted from a single core world to a multi core world. In fact, we have reached a physical limit to how many transistors can fit on a chip. For this reason, to perpetuate Moore's law, today we achieve additional performance by parallelizing computations across several cores.

This shift brings new challenges. Programmers need to learn how to write software that parallelize gracefully to leverage multiple cores and therefore gain more performance.

Programming language designers can help by providing better abstractions and easier to use libraries with the goal to reduce the conceptual and syntactic gap between sequential and parallel expression of a computation.

Functional to the rescue?

Functional programming provides a mechanism to achieve this goal. In fact, a computation is described as a combination of functions that are side-effect free. In practice, this allows to specify what a computation does and let the compiler decide how to implement it. The programmer doesn't specify how the computation is implemented anymore.

Take as an example, a code to filter a list based on a condition in Scala:

val listOfRedBoxes = boxes.filter(b => b.getColor() == RED);

This is possible through the use of the filter method which abstracts away the internal filtering implementation. One can pass as an argument an anonymous function that evaluates to the filtering condition. Other programming languages such as Groovy, Clojure provide similar facilities to pass code as data, which many refer to as closures. As a result, the compiler can decide whether the filtering logic should be executed sequentially or in parallel.

Contrast the code above with the typical Java idiom that programmers write to filter a list based on a condition:

List<Box> listOfRedBoxes = new ArrayList<>();
for(Box b : boxes)
{
    if(b.getColor().equals(RED))
        listOfRedBoxes.add(b);
}

Here, the use of an accumulator and a for-each loop to describe the traversal logic of the filtering enforces a sequential execution. Sometimes a for-each loop is desirable as it is in-order, however, sometimes this specification is too tight and prevents additional performance.

Java 8 brings a revamped collection library together with a mechanism to pass code as data refered to as lambda expressions in order to facilitate writing code that parallelize gracefully.

The example above can be written as follows:

List<Box> listOfRedBoxes = boxes.stream().filter(b -> b.getColor().equals(RED))
                                .into(new ArrayList<Box>());

The traversal logic is not fixed by the language anymore and can be chosen by the library implementation or compiler. As a result, paralellism and out-of-order execution can be chosen to improve performance.

One could argue that passing code as data in Java is already possible via anonymous inner classes. However, lambda expressions bring many advantages over anonymous inner classes such as better readability, simpler semantics and stronger inference which we discuss in more details in the next article.

Set Up JDK8

At the time of this writing, there isn't a stable version of jdk8 yet. However, you can download an early access implementation of the lambda project and a separate early access implementation of the type annotations project.

After downloading the archive containing the jdk, set the environment variable JAVA_HOME to the path where the unpacked directory is located. To check it's set up correctly, try the following commands:

$ java -version
openjdk version "1.8.0-ea"
OpenJDK Runtime Environment (build 1.8.0-ea-lambda-nightly-h1171-20120911-b56-b00)
OpenJDK 64-Bit Server VM (build 24.0-b21, mixed mode)

$ javac -version
javac 1.8.0-ea

You can now try the following code:

import java.util.*;
public class TestLambda
{
    public static void main(String... args)
    {
        List<Integer> l = Arrays.asList(1,2,3,4,5);
        int sum = l.stream().map(x -> x*2).reduce(0, (a,b) -> a+b);  
        System.out.println(sum);
    }
}

Can you guess the output? Compile it and run it!

$ javac TestLambda.java 
$ java TestLambda
30

Note that most IDEs don't provide support for Java 8 yet so you will have to compile files using javac.

IntellJ seems to be the only IDE supporting lambdas at the moment.

Start hacking!

Monday, September 3, 2012

Java 8 Features: Discover Repeating Annotations

Java 8 will bring many features including lambdas, virtual extension methods, type annotations as well as various library enhancements. The full list is available on the OpenJDK/JDK8 website.

In this blog post, I would like to describe a less advertised feature: repeating annotations. The implementation in the Java 8 compiler was pushed a few days ago. More information and discussions can be found on the dedicated OpenJDK mailing list. It is important to stress that this feature was designed for EE library designers and users. The usability requirements are therefore different than standard Java users.

Currently Java forbids more than one annotation of a given annotation type to be specified on a declaration. For this reason, the following code is invalid:

@interface Foo { int value(); }

@Foo(1) @Foo(2) // error: Duplicate annotation
class Functr{}

Java EE programmers often make use of an idiom to circumvent this restriction: declare a new annotation which contains an array of the annotation you want to repeat. It looks like this:

@interface Foo { int value(); }
@interface Foos {
    Foo[] value();
}

@Foos({@Foo(1), @Foo(2)})
class Functr{}

Java 8 essentially removes this restriction. Programmers will now be able to specify multiple annotations of the same annotation type on a declaration provided they stipulate that the annotation is repeatable. It is not the default behaviour, annotations have to opt-in to be repeatable.

A set up is required to specify that an annotation can be repeated. You specify the containing annotation type using @ContainedBy and the repeatable annotation type using @ContainerFor:

@ContainedBy(Foos.class)
@interface Foo { int value(); }

@ContainerFor(Foo.class)
@interface Foos {
    Foo[] value();
}

@Foo(1) @Foo(2) // valid in Java 8!
class Functr{}

There is some discussion as to why this set up is actually required because it adds code verbosity. In a nutshell, it is needed for compatibility reasons with the updated reflection API. In fact, the previous idiom and the new set up compile down to the same structure (provided @Retention(RetentionPolicy.CLASS) is specified on the annotations otherwise annotations are not stored in the generated class file by the compiler). For this reason, the reflection API needs a mechanism to differentiate between implicit and explicit repetitions. The language designers introduced the two new markups to tackle this issue.

Now you might ask why do we actually need @ContainedBy and @ContainedFor? Why do we have to define an extra container? We could just have one new annotation to specify an annotation type to be repeatable. The reason is that EE libraries use the previous idiom and language designers wanted to give them a straightforward transition to use repeating annotations. It is convenient to simply annotate the existing annotation declarations: the code update required is minor and low risk (nothing is removed).

Friday, August 31, 2012

Java Oddities (Part II)

My previous post (Java Oddities Part I) created a lot of discussion on reddit. People suggested many interesting cases and I would like to describe some of them with additional details.

Thanks to Ben Evans for contributing some further information.

Dangerous Method Overloading

// credit to choychoy
List<Integer> list = new ArrayList(Arrays.asList(1,2,3));
int v = 1;
list.remove(v);
System.out.println(list); // prints [1, 3]

List<Integer> list = new ArrayList(Arrays.asList(1,2,3));
Integer v = 1;
list.remove(v);
System.out.println(list); // prints [2, 3]

The java.util.List interface describes two methods named remove.

The first one is remove(int). It removes an element from the list based on its index, which is represented by a value of type int (note: an index starts at 0). The second one is remove(Object). It removes the first occurrence of the object passed as argument.

This is referred to as method overloading: the same method name is used for describing two different operations. The choice of the operation is based on the types of the method parameters. In academic terminology we will say that it is an example of ad-hoc polymorphism.

So what happens in the piece of code above? The first case is straightforward as we pass a variable of type int and there's a signature for remove which expects exactly an int. This is why the element at index 1 is removed.

In the second case, we pass an argument of type Integer. Since there is no signature for remove that directly takes an Integer parameter, Java tries to find the closest matching signature. The Java Language Specification (Determine Method Signature) states that resolution based on subtyping comes before allowing boxing/unboxing rules. Since java.lang.Integer is a subtype of java.lang.Object, the method remove(Object) is invoked. This is why, the call remove(v) finds the first Integer containing the value 1 and removes it from the list.

Note that this problem wouldn't exist if the java.util.List interface differentiated the two remove operations with two different method names: removeAtIndex(int) and removeElement(Object). For those interested in getting more views about method overloading, there is a famous paper from Bertrand Meyer on the topic.

Array Initializer Syntax Curiosity

Java just like C and C# allows a trailing comma after the last expression in an array initializer. This is documented in the Java Language Specification (Array Initializer).

However, what if the initializer contains no expression? This is where Java differs from other languages like C and C#:

// Java
int a[] = {}; // valid
int b[] = {,}; // also valid, an array of length 0 >:o
// C
int a[] = {,}; // error: expected expression before ‘,’ token
// C#
int a[] = {,}; // Unexpected symbol ','

The Type of a Conditional Expression

// credit to fragglet
Object o = true ? 'r' : new Double(1);
System.out.println(o); // 114.0
System.out.println(o.getClass()); // class java.lang.Double

This looks a bit odd. The conditional expression is true, so you might expect that the char 'r' would be boxed into java.lang.Char.

How did we end up with java.lang.Double as the runtime type of o? The value 114.0 looks suspicious as well - but we might guess that it's the ASCII value which corresponds to the character 'r'. But why is it ending up in a numeric type?

Let's take a step back, and examine the general question - which is: what should the type of the conditional expression be if the type of the second and third operand are different?

Java has a set of rules to determine this as explained in the Java Language Specification (Conditional Expression).

In this case, the rules say that first of all the third operand is unboxed to the primitive type double. This is specified by the binary numeric promotion rules. After that, a more familiar rule kicks in - the promotion rule for doubles.

This says that if either operand is of type double, the other is converted to double as well. This is why the second operand of type char is widened to a double.

The second and third operand have now the same type and this is the resulting type of the conditional expression - so the expression's type is the primitive type double (and it's value is now the primitive value 114.0). Finally, since we are assigning the result of the conditional expression to a variable of type Object, Java performs assignment conversion. The primitive type double is boxed to the reference type Double (java.lang.Double).

Note that such a mechanism wouldn't be needed for conditional expressions if Java restricted the second and third operands to be strictly of the same type. An alternative option could be union types.

Thursday, August 30, 2012

Java Oddities (Part I)

There's a famous lightening talk given by Gary Bernhardt about Javascript and Ruby oddities.
I would like to start a series of blog posts documenting some oddities in the Java language for fun! I'll explain why or where these oddities come from with reference to the Java Language Specification when possible. I hope you learn some new things. Feel free to email or tweet me if you would like to add to the list.

Array Declarations

Java programmers can declare array variables in several ways:

int[] a;
int b[]; // allowed to make C/C++ people happy

However, the grammar doesn't enforce a particular style for arrays of dimensions greater than one. The [] may appear as part of the type, or as part of the declarator for a particular variable, or both. The following declarations are therefore valid:

int[][] c;
int d[][];
int[] e[]; // :(
int[][] f[]; // :(

This mixed annotations is obviously not recommended by the Java Language Specification (Array Variables) as it can lead to confusions and is reported by code convention tools such as checkstyle.
This can be taken to the extreme. The following method signature in a class or interface declaration will be accepted by the standard Javac parser:

public abstract int[] foo(int[] arg)[][][][][][][][][][][];

The return type of the method foo is int[][][][][][][][][][][][].

In fact, the grammar of ClassBodyDeclaration is defined as follows:

ClassBodyDeclaration = 
   .. | TypeParameters (Type | VOID) Ident MethodDeclaratorRest | ..

MethodDeclaratorRest = 
    FormalParameters BracketsOpt [Throws TypeList] ( MethodBody | [DEFAULT AnnotationValue] ";")

BracketsOpt = {"[" "]"}

The BracketsOpt rule allows a sequence of [] to be inserted after the formal parameters definition.
The relevant lines within com.sun.tools.javac.parser.JavacParser start at 2938.

Array Covariance

Java arrays are covariant. This means that given a type S which is a subtype of a type T then S[] is considered a subtype of T[]. This property is described in the Java Language Specification (Subtyping among Array Types). This property is known to lead to ArrayStore exceptions as documented in the Java Language Specification: (Array Store Exception). For example:

Object[] o = new String[4];
o[0] = new Object(); // compiles but a runtime exception will be reported

Arrays were made covariant because before the introduction of generics it allowed library designers to write generic code (without type safety). For example, one could write a method findItems as follows:

public boolean findItems(Object[] array, Object item)
{
    ...
}

This method will accept arguments such as (String[], String) or (Integer[], Integer) and in a sense reduces code duplication since you don't need to write several methods specific to the types of the arguments. However, there is no contract between the element type of the array that is passed and the type of the item that needs to be found.

Nowadays one can use generic methods (making use of a type parameter) to achieve the same mechanism with additional type safety:

public <T> boolean findItems(T[] array, T item)
{
    ...
}

Integer Caching

int a = 1000, b = 1000;  
System.out.println(a == b); // true
Integer c = 1000, d = 1000;  
System.out.println(c == d); // false
Integer e = 100, f = 100;  
System.out.println(e == f); // true

This behaviour is documented in the Java Language Specification (Boxing Conversion):

If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

For those curious, you can look up the implementation of Integer.valueOf(int), which confirms the specification:

public static Integer valueOf(int i) {
    assert IntegerCache.high >= 127;
    if (i >= IntegerCache.low && i <= IntegerCache.high)
        return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
}