Dangerous new Language Features: Indexing access syntax for Lists and Maps

Maarten Winkels

In this blog I'll talk about a new language feature proposed for Java7 by project coin and the problems I see with it.
Maybe it is time to buy those Scala books and take a deep dive...

Who has ever seen this construct before?

  int i, j;
  i = j = 10;

Will this even compile? Ow, of course it will: the value of an assignment is the right hand side of the assignment and thus it can be reassigned to another variable. In my professional career I have only ever encountered such statements in frameworks written by people that are too smart for their own good. It takes a few seconds to sink in and then you realize that both variables will be set to the same value, no problem!

I don't see the value of this "feature" of the Java language: splitting it over two lines will not harm the compiler, the JVM and definitely not the poor programmer that needs to read the lines of code after the genius that originally wrote them has left the building.

Recently I read a the proposal for a new feature in Jdk7 that will provide indexing access for Lists and Maps. The normal examples are quite easy to read or write:

  List names = ...
  names[0] = "Maarten Winkels";              // -> names.set(0, "Maarten Winkels");
  String you = names[1];                     // -> names.get(1);

  Map residence = ...
  residence["Maarten Winkels"] = "India";    // -> residence.put("Maarten Winkels", "India");
  String r = residence["Koningin Beatrix"];  // -> residence.get("Koningin Beatrix");

This looks good at first sight. It is all syntax sugar, so nothing really fancy going on, but the code is shorter and easier to read, while it still adheres to a lot of good things like type safety (the list is still a list) and polymorphism (the index operators are defined on core Java interfaces and will work on every implementation).

The I read the more advanced example and something began to stir in my mind.

  residence["Maarten Winkels"] = residence["Koningin Beatrix"] = "Palace";

So what does this line of code tell you? You might have to look it over just one more time before you give your answer.

At first, your conclusion might be that the Queen ("=Koningin") and I are going to share a Palace. This is what the normal "double" assignment means, right? Both variables will have the same value. Think again! The queen is going to move to a Palace and I'm left with wherever she lived before (which might be still oke...)! The is because the put operation on a Map actually returns the previously stored value. The code translates to:

  residence.put("Maarten Winkels", residence.put("Koningin Beatrix", "Palace"));

Which should make it clear that I'm going to have a different residence than the Queen. If there was no value associated to the Queen in the Map, put returns null and I might even be homeless!

I think this is a very bad feature. One might argue that the "double" assignment feature is not used very often, but my feeling is that a language should be as consistent to a reader as it can be and this to me clearly deviates from that path.

How can we solve this problem while maintaining consistency and backwards compatibility? I think that is really hard: We could make the assignment operator from the proposal not return a value (and return void in stead, making it impossible to do a "double" assignment. This would, however, not be consistent with a normal assignment operator.

Should this caveat prevent this otherwise nice new feature to come to the Java language? I think we should really look at what we are trying to achieve with Java7. For me it would be more important to enable Java developers to easily read and fix existing code bases. If they can save a few characters, that is fine, but overall understandability of the language and ability to fix common bugs with new constructs (e.g. using closures for save resource usage) is far more important. Java (the language) should not add features just to be able to bring down the LoC of a program to the same level as Groovy or Ruby: For a large application, those are very short term goals. There are many options to integrate with these languages on the JVM anyway, so why do we need the Java language to compete with them?

Comments (29)

  1. Vincent Partington - Reply

    June 21, 2009 at 1:44 pm

    Nice catch there!

    The proposed syntax does look nice, just like the enhanced for loop made loops look a lot nicer. I also saw a proposal that would make it possible to do away with those silly getters and setters, something that has been irking me for ages:
    http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000621.html

    So while you could call these proposals "syntactic sugar" (a phrase that I feel is usually meant to end any discussion on the subject!) they do make code easier to write and, most importantly, read.

  2. Shams - Reply

    June 21, 2009 at 2:34 pm

    You'll have the same result with Lists (the set method). The key here is to being aware how the expression the expression you write works.

  3. Maarten Winkels - Reply

    June 21, 2009 at 3:17 pm

    @Vincent: I wouldn't discard any proposal on the basis of the term 'syntactic sugar'. I do feel that we need to scrutinize these proposals to find out caveats like this one and weight the value against the problems that they might bring.

    So what do you think: (1) Include it (as-is) and live with the caveat (on the basis that it is very obscure to use this "feature" anyway) (2) Discard it (3) Improve it?

    To add some more fuel to the discussion:
    - I feel there is nothing wrong with an OO language having a list and then having a get method. That is perfectly in line with what you would expect. If you like square brackets: Use an array!
    - Maybe we should add a Hash type to Java (just like array) to enable the same syntax for maps. This would keep the language more consistent: you would have a List-interface with an Array similar type and a Map-interface with a Hash similar type. The assignment operation on the Hash would just return the LHS and not the previously kept value, so "double" assignment would work in the same way as with other constructs.

  4. Maarten Winkels - Reply

    June 21, 2009 at 3:38 pm

    @Shams: That is exactly the point. To illustrate it with list:

    	public void testListOrArray() throws Exception {
    		List list = new ArrayList();
    		list.add(1);
    		assertEquals(new Integer(1), list.set(0, 2));
    		
    		int[] arr = new int[]{1};
    		assertEquals(2, arr[0] = 2);
    	}
    

    I think it is really undesirable that with the new proposal list[0] = 2 will evaluate to 1 where arr[0] = 2 (already) evaluates to 2.

    It is fine that you know how the expression you *write* works, but to have two expressions that *read* exactly the same behave differently is a problem in my opinion.

    Think about polymorphism, the whole goal is that the implementation might differ, but the interface should make the intent clear to the caller. This is completely opposite: As caller I have to know exactly how the implementation works, since although the code is the same the contract is different!

  5. Manu - Reply

    June 21, 2009 at 3:53 pm

    Since the compiler is now able to warn us on lits types, the expression " residence.put("Maarten Winkels", residence.put("Koningin Beatrix", "Palace"));" should generate a warning to the developer, no ?

  6. Maarten Winkels - Reply

    June 21, 2009 at 3:59 pm

    @Manu: I don't understand exactly what you mean. What would it warn you for? There is nothing wrong with the statement, other than that it is confusing for readers.

  7. Manu - Reply

    June 21, 2009 at 4:31 pm

    If residence map is typed with , when you write "residence.put("Maarten Winkels", residence.put("Koningin Beatrix", "Palace"));" which is equivalent and will be translate to "residence.put("Maarten Winkels", residence.put("Koningin Beatrix", "Palace"));", we should get a warning from the compiler no ?

  8. Maarten Winkels - Reply

    June 21, 2009 at 4:56 pm

    @Manu: The type declaration has been erased due to non-escape (pun intended). The following test compiles without warnings:

    	public void testMap() throws Exception {
    		Map<String, String> residence = new HashMap<String, String>();
    		residence.put("Maarten Winkels", residence.put("Koningin Beatrix", "Palace"));
    		assertEquals(residence.get("Koningin Beatrix"), "Palace");
    		assertNull(residence.get("Maarten Winkels"));
    	}
    
  9. Shams - Reply

    June 21, 2009 at 6:04 pm

    @Maarten Winkels

    Your argument with respect to consistency is valid, not sure whether an analogy with polymorphism is relevant here though.
    If you do want to compare it with an interface, here's my perspective. The contract with respect to the = operator when used for 'multiple' assignments is that expressions will be evaluated right to left, the left operand being 'assigned' the value of the right expression. In case of arrays it happens to be the value of the accessed element and in case of lists/maps it is the value of the set/put methods respectively.

  10. Maarten Winkels - Reply

    June 21, 2009 at 6:26 pm

    @Shams: That is very clear, thanks for the explanation. The question remains: what to do about this? The balance between consistency and ease-of-use seems hard to strike.

    How about adding a new "Hash" (Map-like) type in analogy to the Array-List analogy? This way you could use the new notation with these types, have convenient conversion classes (like Arrays and Collections) and be both consistent and improve readability.

    I know: the Array-List duality is not a very nice language feature. Since Java is notorious for never braking backwards compatibility, I think we will have to live with that.

  11. Andrew Phillips - Reply

    June 21, 2009 at 8:51 pm

    I suppose there is a certain kind of irony involved in the fact that, in Scala - the language referred to at the beginning of the post - a(i) = a(i) + 1 is actually interpreted as a.update(i, a.apply(i) + 1) (see section 4.3 of An Overview of the Scala Programming Language).

    As far as I am concerned, "syntactic sugar" is not in itself a bad thing if it leads to more concise, more readable, less boilerplate-heavy code. The multi-catch proposal, for instance, falls into this category.
    But I completely agree with Maarten that convenience should not be used as a reason to introduce potentially confusing, or even contradictory, features into a language.

    The Scala example works because it is consistent. The list and map examples presented here, on the contrary, are not consistent - they make types that happen to be very closely related appear identical, when in fact they are not.

    Saving a few keystrokes at the cost of code that is surely bound to lead to some hard-to-find bugs and plenty of headscratching - even for the original author - seems like a very bad trade off. I'm afraid I find it surprisingly easy to imagine that day when someone changes a method signature and decides to pass a list instead of an array, which breaks your code n lines on.

    Personally, I feel the kind of ease-of-use features we're talking about are precisely what IDEs are supposed to be about. It's nice if you can design them into the language from the beginning, of course, but to me that's very much a "nice to have".

  12. Shams - Reply

    June 21, 2009 at 10:12 pm

    @Maarten
    Introduction of the Hash type would defeat the purpose - to be able to use the indexed syntax with Lists and Maps 🙁

    A method of maintaining the consistency and backwards compatibility could be to implement utility methods in say for e.g. Collections class:
    public static V get(Gettable pObject, K pKey) {...}
    public static V set(Puttable pObject, K pKey, V pNewValue) {...}
    public static V set(Map pObject, K pKey, V pNewValue) {...}
    public static V get(GettableFromInt pObject, int pIndex) {...}
    public static V set(SettableViaInt pObject, int pIndex, V pNewValue) {...}
    public static V set(List pObject, int pIndex, V pNewValue) {...}

    and also have Maps implementing Gettable and Puttable and Lists implementing GettableFromInt and SettableViaInt.

    The compiler will translate appropriately to corresponding method calls during compilation.
    This solution can easily be extended to have support for custom classes with the indexing access if they implement the appropriate interfaces.

  13. Vincent Partington - Reply

    June 22, 2009 at 7:28 am

    @Maarten, I'll respond to your fuel first. 🙂 Having both an ArrayList and an array in a language always seemed a bit silly to me. It's a holdover from the C era when the only native "data structure" type was an array and an other data structures had to be written by hand. Adding another native type, namely the Hash type, would only add insult to injury!

    To me it makes sense to promote Lists and Maps to first class citizens of the language so I would vote for (1) include this feature as-is. The property feature has an even higher priority to me because invoking getters and setters is a big pain. C# and ActionScript have solved this problem nicely years ago.

  14. Maarten Winkels - Reply

    June 22, 2009 at 7:58 am

    @Shams: I fail to see how adding these static methods would solve the problem, or achieve the goal of the proposal (i.e. use indexed syntax for lists and maps) for that matter.

    @Vincent: I agree completely with you that having a native and a data structure type for the same purpose is not a good thing. But fact is that Java is that way right now. I'm not sure whether it would be better (consistency) or worse (type duality) by adding a native Hash type.

    Also the property proposal looks quite ok, although it seems that there is some unclarity about the width and depth of that proposal and also about the syntax to use for property access (apparently both 'bean->property' and 'bean.property' are being discussed). Let's hope it will not end up the same way as the Closure proposal...

  15. Sebastian Mueller - Reply

    June 22, 2009 at 9:21 am

    I think what Shams wants to say is that the assignment operator should not simply map to the put/add operator (and reuse the return type and its specification) but should instead delegate to a slightly modified (perhaps static) method:

    public static V operatorBraces(/*this*/Map map, K key, V value){
    map.put(key, value);
    return value;
    }

    This would solve the problem, as far as I can see. The construct would map to this method which would actually call the put method on the map implementation but would nevertheless return the value so that the result of the assignment is the value and assignments can be chained.

    Note the commented /*this*/ - If they had added extension methods to the language this could even be mapped to them (well in this case operator overloading would be required, too). However the above syntax is almost legal in C# 3.0 (where extension methods exist and operator overloading is allowed).
    Adding extension methods to the language the way it is implemented in C# would have been a lot more beneficial and would have reduced the LoC dramatically for many use cases (Arrays class and Collections class and all those other static helper classes anyone?!)

  16. Maarten Winkels - Reply

    June 22, 2009 at 11:18 am

    @Sebastian & Shams: That would indeed solve the problem. I was missing the part where the static method returns the *new Value* in stead of the *old Value* (as the normal set/put operations do).

    I think it would only be needed for the set/put operation on List and Map, so no new Interfaces are needed. It would also work with customized implementations of List and Map provided the specification is based on the interfaces and not their standard implementations.

  17. Joe Wright - Reply

    June 22, 2009 at 1:17 pm

    This language feature is already in C#. The problem you describe with the multiple hash assignment is an example of a common case where someone does not understand the API (I know it's hypothetical). It's possible that a person would want to do this:

    previousResidence["Koningin Beatrix"] = residence["Koningin Beatrix"] = "Palace";

    Which makes sense, even if it is a contrived example.

    I was pretty hyped about adding Closures to Java till I heard Joshua Bloch point out the flaws: http://rickyclarkson.blogspot.com/2007/04/is-josh-bloch-biggest-problem-for.html

    His suggestion of having try/catch with resource closing solves the problem you mentioned in your article:
    try(Closable c) {}
    catch() {}

  18. Maarten Winkels - Reply

    June 22, 2009 at 2:42 pm

    @Joe: I think you're kind of missing the point here. The problem is not that it is difficult to understand the API, it is that adding the new index access syntax will make parts of the language that look very similar behave very differently. If C# is inconsistent in that sense, I think that is hardly a reason to make Java similarly inconsistent.

    If you want to do

      previousResidence.put("Koningin Beatrix", residence.put("Koningin Beatrix", "Palace"));
    

    that is fine, you can do that in java and your intent is perfectly clear from the code.

    Now the following snippet most clearly shows the problem with the current proposal:

      List<String> names1 = new ArrayList<String>();
      names1[0] = "Maarten";
    
      String[] names2 = new String[]{"Maarten"};
    
      assert "Winkels" == (names1[0] = "Winkels");
      assert "Winkels" == (names2[0] = "Winkels");
    

    The first assert will fails since the list assignment (names1) will actually return the previously stored value. To any reader of the code it will be very hard to understand why there is a difference between those two lines.

  19. SteveC - Reply

    June 22, 2009 at 7:59 pm

    I think the confusion may be overstated. Take the common idiom if ((line = reader.nextLine()) != null) ...
    This makes use of the fact that assignment returns a value which is the same thing happening in the line i = j = 10, i is the value of the assignment of j=10 as you describe.

    "Double assignment" is not a some new feature, it is merely recognizing that assignment returns a value. Changing the semantics of put to prevent this usage would no longer be semantic sugar, but a change to the Map API. Since it is well documented, it seems the best recourse is to read and understand the documentation.

    The example with the assertion is a little contrived because you chose similar names for different data structures. If you were to use two different types that represent data differently, different names would be helpful. I can not think of a reason for using the second assertion which is basically assert "Winkles" == "Winkles". If you ever see it in code, you know there is a problem.

  20. Kaj - Reply

    June 23, 2009 at 7:44 am

    I don't see this a huge problem. There will always be developers that write code that is hard to understand, and I really really doubt that you often will see code like:

    residence["Maarten Winkels"] = residence["Koningin Beatrix"] = "Palace";

    (And I actually read that correct when I read it, since I knew that put returns the previous value). Boxing/Unboxing is far more dangerous.

  21. RFletch - Reply

    June 23, 2009 at 7:47 am

    Noone is ever going to use it in the double assignment case. In fact, I'd be for removing double assignment and getting this feature if people are worried about it given how infrequently double assignment is actually useful to me rather than decoration.

  22. Sebastian Mueller - Reply

    June 23, 2009 at 8:46 am

    Who says that the assignment operator has to directly map to the put/set methods? Why not map to these methods and nevertheless make the expression (that's what it is conceptually, its not a method call) evaluate to the assigned value. This can be done using the static method I outlined above.
    In Java an assignment expression always evaluates to the assigned value. A method may return whatever it wants. If the language supports the assignment expression as a convenience method for the put/set methods, it should use the assignment semantics and evaluate to the value being assigned.
    This does not make it any more complicated: With the current proposal the developers needs to know that directly the method maps to the put method *and* returns the return value of the put method. Using the other proposal the developer needs to know that it maps to the put method (which is rather obvious) and behaves like the assignment operator does everywhere else in Java (which should be obvious).
    If a developer comes across this kind of code without knowing that it maps to the put method, my guess would be that she takes the "assignment-expression-evaluates-to-assigned-value" part for granted, because that is what they *already know*.
    Also think about (hand-)refactoring consequences if change the type of the variable from array to arraylist and forget to adjust the double assignments.

  23. javer - Reply

    June 23, 2009 at 11:04 am

    how should work int x, y;
    x = y = 5;
    print(x);
    print(y);

    what is the result?

    let's change x and y to List.

    List x = ...;
    List y = ...;

    x[0] = y[0] = 5;

    This MUST work same way as x = y = 5; And use same semantic.

    For my it does not matter will it use put method or any other.
    residence["Maarten Winkels"] = residence["Koningin Beatrix"] = “Palace”;

    should workas:
    residence.put("Koningin Beatrix", “Palace”);
    residence.put("Maarten Winkels", “Palace”);

  24. javer - Reply

    June 23, 2009 at 11:08 am

    residence["Maarten Winkels"] = residence["Koningin Beatrix"] = “Palace”;

    i think better is to do this way:
    residence.put(”Koningin Beatrix”, “Palace”);
    residence.put(”Maarten Winkels”, residence.get(”Koningin Beatrix”));

    because correct order is from left to right, and semantic of
    residence["Maarten Winkels"] = residence["Koningin Beatrix"];

    have to be transformed into:
    residence.put(”Maarten Winkels”, residence.get(”Koningin Beatrix”));

  25. javer - Reply

    June 23, 2009 at 11:32 am

    better way is to use special interface for "indexed" access - this will work for all collections and lists and all other stuff.

  26. Maarten Winkels - Reply

    June 23, 2009 at 11:45 am

    @Sebastian: I completely agree. The current proposal, however, states:

      m2[l1[2]] = m2[m1[1]] = 4; // same as m2.put(l1.get(2), m2.put(m1.get(1), 4));
    

    (see ADVANCED EXAMPLE section)

  27. Sebastian Mueller - Reply

    June 23, 2009 at 1:19 pm

    @Maarten: I saw the proposal and I clearly object to that part of it. it should be rewritten like this, IMHO:

    m2[l1[2]] = m2[m1[1]] = 4; // same as HelperClass.assignmentOperator(m2, l1.get(2), HelperClass.assignmentOperator(m2, m1.get(1), 4));

    And HelperClass.assignmentOperator should be implemeted as shown earlier to return the assigned value unconditionally. (Of course the compiler could directly inline the code from HelperClass, but at least conceptually it should be done like this.)

    I don't think the the getter should be invoked so that the map gets a chance to "replace" the value as proposed by javer. I think the result should be the same if you do:
    m2[l1[2]] = m2[m1[1]] = 4;
    and
    m2[m1[1]] = 4;
    m2[l1[2]] = 4;

    (note the order) - the right-most assignment should be "executed" first.

    If a developer needs the different (and less common, IMHO) behavior, she can use the slightly more verbose syntax using the put/get methods.

  28. Shams - Reply

    June 23, 2009 at 2:54 pm

    The mailing list at Project Coin (http://openjdk.java.net/projects/coin/) would be a nice place to discuss too.

  29. Frits Jalvingh - Reply

    July 10, 2009 at 1:05 pm

    All of this is kind of nonsense. Of course the "contract" of the assignment operation (returning the assigned value) may not change (that point is validly made) - and it is trivial for the compiler to make that so.

    This discussion wrongly assumes that the compiler /must/ use the return value of a put as the "result" of the assignment which is nonsense; it can easily generate trivial code to drop (pop) the result of the put and use it's second parameter as the "return value". Using another "method" in whatever map is just nonsense and unnecessary: this requires only a few extra bytecode instrutions to be emitted (a dup and a pop, gee).

Add a Comment