Learned the hard way: Type safety is not everything

Posted on May 2, 2015 by phimuemue

You may have heard that in Haskell "once it compiles, it works". While this seems to apply in Haskell more than in other languages, a compiling program is not guaranteed to be bug-free, even if you might think so. In this post, I shortly want to showcase an example I experienced and what I hopefully learned from it.

Setting

I was working on my lambda calculus interpreter to iron out the following "smell": This interpreter runs each transformation in two phases:

Therefore, I employ the following data type which supports storing the additional information of "transformability" in additoinal fields:

data Expression = 
    -- Variable may be expandable (macros)
    Variable (Maybe Expression) String
    -- Abstractions may be eta-convertible
    | Abstraction Bool String Expression
    -- Applications may be beta-convertible
    | Application Bool Expression Expression

The algorithm, then, is fed an expression whose respective fields are either Nothing (for variables, meaning that they can not be expanded) or the value False (for abstractions and applications, indicating that they can not be beta- or eta-reduced).

Quite naturally, the algorithm recursively descends into the subparts of an expression, and tries to transform it. of course, It has to signal success or failure to its caller in some way.

Until recently, this was done by wrapping an Expression in an Either Expression Expression, where a Right indicates success, and accordingly Left means failure.

After this computation, the results could be combined into new Expressions.

Note that the type Either Expression Expression does not really convey the intuition of success or failure with only one possible result type (namely Expression). Because of this, I wanted to switch from Either Expression Expression to Maybe Expression, signalling failure by Nothing and success by a Just.

Step 1: Changing types

At the beginning I thought that this would only be a matter of syntactic changes, basically replacing all Either Expression Expression by Maybe Expression, substituting Left x by Nothing, Right by Just and adjust a negligible amount of code that forms the basic building blocks of my transformation.

After all, that's what I appreciate in Haskell - you can compose functions and types in very flexible ways, allowing you to quickly adjust to other - equivalent - representations.

However, I quickly had to realize that this did not bring quite the effect I anticipated. Well, I saw that some parts seemed to work just right away, but all in all, it did not have the desired effect.

In particular, prior to my change I had a function taking a TaggedExpression (back then a Either Expression Expression) and converting it to a "normal" Expression:

fromTaggedExpression :: (Either Expression Expression) -> Expression
fromTaggedExpression (Left a) = a
fromTaggedExpression (Right a) = a

Straightforward, I wanted this function now to convert from a Maybe Expression to an expression. Since I could - up to that point - simply extract the expression from a Left or a Right, I had the appropriate expression available in the case of both success and failure.

Switching to Maybe Expression meant that fromTaggedExpression needed a way to cope with failure adequately, which meant that I'd just supply the "original" expression into fromTaggedExpression:

fromTaggedExpression :: Expression -> (Maybe Expression) -> Expression
fromTaggedExpression e Nothing = e
fromTaggedExpression _ (Just e) = e

Or, more straightforward: fromTaggedExpression = fromMaybe from Data.Maybe.

Now, every call to fromTaggedExpression had to supply the original expression plus the transformed one. While being a little bit more work for the caller, this meant that I did not have those Either Expression Expressions anymore. Of course, I had to adjust some other helper functions such as combining functions for TaggedExpressions, as well.

However, when I started out to experiment with it, I encountered some strange bugs. The main problem was that the functions to combine TaggedExpressions often resulted in failed pattern matches - something that I deliberately wanted when I constructed those functions to get a notification if it received obviously wrong input.

Step 2: Chasing bugs

It took me quite some time to find out what my mistake here was: Previously, a Left carried the semantics "could not find any transformation in expression" while a Right meant exactly the opposite. However, there were some cases in the code at the very last output resp. very first call to the transformation routine that did not adhere to this policy completely.

But since the types were correct, the compiler could not complain... which is something I can not blame Haskell for.

However, I had to ask myself: How could this happen? And more important: How can I avoid those problems in the future?

Step 3: Fixing the bug

To fix this bug, I stepped back one step and did not do the whole transformation from Either to Maybe in one step, but instead transformed fromTaggedExpression into an intermediate state working on Either Expression Expression, but with all the information needed by the Maybe variant:

fromTaggedExpression :: Expression -> (Either Expression Expression) -> Expression
fromTaggedExpression e (Left e') = e
fromTaggedExpression e (Right e') = e'

This alone does not help that much - but now, I could do the following examination:

I exploited this fact and changed fromTaggedExpression to the following after importing Control.Exception:

fromTaggedExpression :: Expression -> (Either Expression Expression) -> Expression
fromTaggedExpression e (Left e') = assert (e==e') e
fromTaggedExpression e (Right e') = assert (e/=e') e'

From this point, the culprit was found quite quickly because the asserts triggered when the respective calls happened.

Resume

I have (hopefully) learned the following lessons from this bug:

A rigid type system is good and helps you a lot, but it can not catch everything. Thus, I should identify semantics that can not be proven (or disproven) by the compiler and document them more strictly.

I would be interested if this bug would have happened if I directly started out using Maybe Expression instead of Either Expression Expression. In the latter case, the implementation might be easier at some points, but it seems that a Maybe more strongly and naturally suggests what you are dealing with, so it may have been the better choice in the first place.