Algebraic Data Types In Scala

Object-Oriented Meets Functional

Scala claims to unite object-oriented and functional programming, and offers a rich set features from both worlds. Developers coming from object-oriented languages—Java in particular—quickly adopt the object-oriented features of Scala but often struggle to find their way around functional programming. Some aspects of functional programming found their way into object-oriented languages: Higher order functions or combinators like map and filter appear in today’s C# or Java code, and even a preference for immutable data structures spreads out into conventional object-oriented languages.

But algebraic data types (ADTs) still do not appear in object-oriented programming although these enable the true power of functional programming: Types well-founded on theory let us model the problem domain in types and thus help us write correct-by-construction code which provides a higher level of compile-time confidence than possible with the type systems of most object-oriented languages. This article aims to help developers from object-oriented languages understand what it means, and become familiar with the basic set of algebraic data types commonly used in functional programming and its appearance in Scala.

Simple Types

Developers from object-oriented programming languages often conflate “types” with “classes” but in fact represent a much simpler, yet more powerful concept. For this article we define “type” as just a name for a set of values. We can define a Boolean type with standard set notation now:

V(Boolean) = {true, false}

This type has just two values, the well-known boolean constants true and false. We can also define more complex types, eg, Int for all integers, positive and negative:

V(Int) = {n | n ∈ ℤ}

Or a String as sequence of characters where Unicode denotes the set of all unicode code points:

V(String) = {c0c1c2… | c0c1c2… ∈ Unicode}

Combined Types

We can combine these simple types in two fundamental ways: Either as a group of values of other types, ie, a product type, or as an alternative between values of different types, ie, a coproduct type or sum type. We can “calculate” with these types just like we can calculate sums and products of numbers, and these types obey similar laws. We say they “form an algebra”, hence the name “algebraic data types”.

Product types

Formally we can define the product of types1 T1, T2, … as follows:

V(P) = {C v1 v2 … | v1 ∈ T1 ∧ v2 ∈ T2 ∧ … }

C denotes the constructor of the new type. This constructor serves as a “tag” to differentiate between two product types combined of the same types. With the help of this tag we can create two different products of, eg, Boolean and Nat, by giving them different constructors. The following type introduces a generic pair, for example, one of the most basic product types:

⋀ S, T. V(Pair[S, T]) = {(s, t) | s ∈ V(S) ∧ t ∈ V(T)}

We use the common infix notation (v, t) by which pairs appear in almost all contemporary programming languages, from Python to Haskell. Now (42, “Donald Duck”) becomes a value of the type Pair[Int, String]. By using a different constructor we can also give this generic pair a special name, like User with an ID and a name:

V(User) = { User i s | i ∈ V(Int) ∧ s ∈ V(String)}

We follow the convention to name the constructor like the type, but we can also choose different names for each. This new type combines the same types as Pair[Int, String] but holds different values and thus becomes distinct from a pair. User 42 “Donald Duck” is a value of type User, but not of Pair[Int, String],

Sum types

Like a product combines values of different types at the same time a sum type2 provides an alternative between values of different types. Formally we can define a sum of types T1, T2, … as follows:

V(S) = {C1 v1 | v1 ∈ T1} ∪ {C2 v2 | v2 ∈ T2} ∪ …

Again C1, C2, … denote constructors where each constructor lifts another constituent type to the sum type. We can now define the common Either type, as an alternative between two types:

⋀ L, R. V(Either[L, R]) = {Left l | l ∈ V(L)} ∪ {Right r | r ∈ V(R)}

Now we can use the type Either[String, User] to represent the result of finding a user in a database. In case of error we return *Left “User not found”*—which strictly speaking is of type Either[String, T] for any *T*—otherwise we return Right user where user is a value of type User.

Types in Scala

In Scala we can already use pairs and tuples—the standard library includes these—and we can also define our own products with case classes:

> (42, "Donald Duck")
res0: (Int, String) = (42,Donald Duck)
> final case class User(id: Int, name: String)
> User(42, "Donald Duck")
res1: User = User(42,Donald Duck)

Scala also supports sum type, but lacks a syntax for these. A sum type like Either looks straight-forward in Haskell:

data  Either a b  =  Left a | Right b

The same definition in Scala looks considerable more noisy3:

sealed trait Either[L, R]

case class Left[L, R](value: L) extends Either[L, R]
case class Right[L, R](value: R) extends Either[L, R]

This illustrates the common pattern to define sum types in Scala: In the absence of first-level support for sum types we must exploit subtyping to achieve the effect of sum types.

We define the type itself as a sealed trait. The sealed keyword forces us to define all subtypes in the same file as the trait and thus allows the Scala compiler to subsequently warn if a pattern match over the type does not match all subtypes (“exhaustiveness check”)4. Then we define each branch of the sum as distinct subclass of the trait, and use the corresponding type parameter as type for the value of either side. This use of subtyping has important implications for the ergonomics of the type which we will cover in the section after the next.

Shapes of generic types

If we look at the types in the previous section we notice some similiarity between product types. A Pair[Int, String] and our User type have the same shape: Apart from their constructors they have the same values. In fact constructors exist just to introduce an “artificial” distinction between otherwise equal sums or products of types, and thus allow us to give different names to the same type to aid understanding of our programs. But we can “omit” the constructors and abstract over the shape of these types.

This leads us to the famous Shapeless library for Scala which provides types to abstract over the shape of algebraic data types. At the core of Shapeless lies a special HList type for a heterogenous list—a list where each element has a different type:

⋀ T1, T2, …. V(T1 :: T2 :: … HNil) =
{HNil} ∪ {v1 :: v2 :: … :: HNil | v1 ∈ T1 ∧ v2 ∈ T2 ∧ …}

HNil denotes an empty list, to which we prepend an infinite amount of items of different types. We can now say that 1 :: “Donald Duck” :: HNil is of type Int :: String :: HNil. With this type we can express the shape of product types: Both, a Pair[Int, String] and our User type have the shape Int :: String :: HNil. We can convert product types to their shape—we can also say, their generic “shapeless” representation—with shapeless.Generic5:

@ import shapeless._
import shapeless._

@ final case class User(id: Int, name: String)
defined class User

@ Generic[User].to(User(42, "Donald Duck"))
res2: Int :: String :: HNil = 42 :: "Donald Duck" :: HNil

@ Generic[(Int, String)].to((42, "Donald Duck"))
res3: Int :: String :: HNil = 42 :: "Donald Duck" :: HNil

We see that the User case class and a (Int, String) pair indeed have the same representation Int :: String :: HNil, so we convert between both types through their generic representation:

@ val userPair = (42, "Donald Duck")
userPair: (Int, String) = (42, "Donald Duck")

@ val genericUser = Generic[(Int, String)].to(userPair)
genericUser: Int :: String :: HNil = 42 :: "Donald Duck" :: HNil

@ val user = Generic[User].from(genericUser)
user: User = User(42, "Donald Duck")

We can even create a User from arbitrary types as long as they have the same shape as our User class:

@ def userFromGeneric[T, Repr](user: T)(
    implicit genUser: Generic.Aux[User, Repr],
    genT: Generic.Aux[T, Repr]
  ): User = genUser.from(genT.to(user))
defined function userFromGeneric

@ val userFromPair = userFromGeneric((42, "Donald Duck"))
userFromPair: User = User(42, "Donald Duck")

@ final case class Foo(i: Int, s: String)
defined class Foo

@ val userFromFoo = userFromGeneric(Foo(42, "Donald Duck"))
userFromFoo: User = User(42, "Donald Duck")

The function userFromGeneric uses the Generic shapes of User and an arbitrary type T to create a user from a value of type T if the type has the same generic representation Repr as our User class. We summon these Generic shapes as implicit arguments and thus let compiler automatically proove for us that User and T have the same shape. Indeed, if the argument to userFromGeneric has a different shape the code does not compile:

@ val userFromFoo = userFromGeneric(
    (42, "Donald Duck", "An additional item"))
cmd17.sc:1: could not find implicit value for parameter genT:
    shapeless.Generic.Aux[(Int, String, String),Repr]

The compiler complains that it cannot prove that (Int, String, String)—which has shape Int :: String :: String :: HNil—has the same shape as User. With this power comes great responsibility, though: When used excessively this kind of generic conversions can introduce subtle bugs and obscure intention and meaning of code. The idea of abstracting over shapes of data types goes further than just converting between data types and applies to other problems as well. We can use it to generically derive type class instances for data types, for example—and indeed, Scala libraries like Circe or pureconfig often use Shapeless for this purpose.

Shapeless also defines generic shapes of sum types: The post Decode irregular JSON from Jenkins with Circe and Shapeless provides a brief explanation and shows their use for decoding irregular JSON objects. We do not cover sum types in detail here because their more intricate encoding requires more space to explain than this article provides. We hope to cover this topic in a future article, tho.

The Effects of Subtyping

When we discussed sum types in Scala we noted the use of subtyping. When the set of values of a type S is a subset of the set of values of type T then S becomes a subtype of T.

V(S) ⊂ V(T) ⇔ S <: T

In Scala we explicitly create subtype relationships between types with the extends keyword: The alternative of the sum type extend the base trait and thus become subtypes of the base trait. Hence the actual definition of the Either[L, R] from above looks a bit different in Scala:

⋀ L. V(Left[L]) = {Left l | l ∈ V(L)}
⋀ R. V(Right[R]) = {Right r | r ∈ V(R)}
⋀ L, R. V(Either[L, R]) = V(Left[L]) ∪ V(Right[R])

In other words, each alternative of a sum type becomes a distinct type. The expression Left("Foo") has type Left[String] for an arbitrary R, not Either[String, R]. We can then widen Left[String] to Either[Left, R] by invoking the subtype relation. The compiler automatically widens co-variant types6 but often we do not wish for automatic widening; in particular automatic widening complicates implicit search which in turn impacts type class resolution so libraries like cats and scalaz made most of their types invariant to prevent automatic widening to subtypes.

With invariant types we can find ourselves in a situation where the compiler infers the subtype, ie, a sum-type variant like Left, but needs the base type, ie, Either. The following (simplified) example with OptionT and EitherT—both invariant7—does not compile, for instance:

import cats.data._
import cats.Id

sealed trait Sum
case object A extends Sum

val res: EitherT[Id, Sum, Int] = for {
  id <- OptionT.pure[Id](42).toRight(A)
} yield id

The compiler complains:

type mismatch;
 found   : cats.data.EitherT[cats.Id,A.type,Int]
 required: cats.data.EitherT[cats.Id,Sum,Int]
Note: A.type <: Sum, but class EitherT is invariant in type A.
You may wish to define A as +A instead. (SLS 4.5)

We must explicitly downcast to Sum with a type ascription to make the code compile:

id <- OptionT.pure[Id](42).toRight(A : Sum)

This issue appears frequently and thus impacts the ergonomics of sum types in Scala, in particular when it causes much worse and less understandable compiler errors than the one above. It appears so frequently that cats and scalaz even have their own family of functions to help with subtyping. The Functor type also provides widen to widen the type of the functor argument:

> A.some.widen
res10: Option[A.type] = Some(A)
> A.some.widen[Sum]
res11: Option[Sum] = Some(A)

Likewise BiFunctor (a functor with two “sides”, like Either) also has a leftWiden which we can use instead of the type ascription above8:

id <- OptionT.pure[Id](42).toRight(A).leftWiden

While these functions provide some convenience to cope with subtyping, all in all ergonomics of sum types often falls short of what other functional languages like Haskell—which lack subtyping—can provide.

Summary

Product types combine different types into a new type; sum types describe an alternative of other types. In Scala the former appear as simple and well-known case classes, whereas the latter have a more intricate representation as subtypes of sealed traits or classes. In some cases this subtyping in sum types interferes with type inference and invariance which makes ADTs in Scala less ergonomic than in other languages like Haskell. Algebraic data types have similar shapes, and we can abstract over these shapes to write generic programs over all kinds of sum or product types. The famous shapeless library provides the necessary infrastructure for this abstraction—in particular a heterogeneous list type as generalization of tuples and generic types to convert concrete algebraic types into the corresponding heterogeneous list.

We hope that this article helped you understand how algebraic data types work, how they appear in Scala, and what the shapeless library contributes to generic programming.


  1. The name “product type” originates from a branch of mathematics called “category theory” a denotes a fundamental way to combine objects in a category. A product type resembles this combination with regards to programming language types, hence the name. We spare category theory in this article—Products and Coproducts from Category Theory for Programmers by Bartosz Milewski provides a gentle introduction to products and coproducts in category theory—but a look at the size of (the set of values of) a product type in relation to the sizes its constituent types gives an intuition about the meaning of the name “product type”. We observe that the size of the product type in fact equals the product of the sizes of its constituent types: |V(P)| = |V(T1)| · |V(T2)| · …

    ↩︎
  2. We observe that the size of a sum type follows a similar analogy as the size of a product type. A sum type has just as many values as the sum of its constituent types: |V(S)| = |V(T1)| + |V(T2)| + …. The more formal name co-product originates from category theory as well. The prefix co- indicates that category theory considers a sum in some ways as the opposite of a product.

    ↩︎
  3. We simplify the definition of Either for this article. In particular we omit all methods on Either and elide variance annotations. For the actual definition see scala.util.Either.

    ↩︎
  4. Instead of a trait we can use a sealed abstract class—in case we need to pass values to the base class, because Scala does not support trait parameters yet. Dotty will add these to Scala.

    ↩︎
  5. We use Ammoniate for these snippets. ↩︎
  6. The article Of variance and functors provides an in-depth explanation of variance and its effects in types. ↩︎
  7. We cannot use Either in this example—historically Scala made Either covariant. ↩︎
  8. This code makes use of partial unification, see Cats FAQ. ↩︎