Coding guild Events Scala9 oktober 2017

This blog post is 4th in a sequence of 5, as a result of our Coding Guild session on August 1st about functional programming concepts. In the session, TypeClasses, Semigroups, Monoids, Functors and Validated were covered. The Cats library has been used, but hasn’t been introduced extensively.

This time we will cover Validated. Validated is categorised in the Cats library as a data type, not a typeclass. But the way it works is very similar to using typeclasses.
Validated is introduced in the Cats library as follows:

Imagine you are filling out a web form to signup for an account. You input your username and password and submit. Response comes back saying your username can’t have dashes in it, so you make some changes and resubmit. Can’t have special characters either. Change, resubmit. Passwords need to have at least one capital letter. Change, resubmit. Password needs to have at least one number.

Or perhaps you’re reading from a configuration file. One could imagine the configuration library you’re using returns a scala.util.Try, or maybe a scala.util.Either. Your parsing may look something like:

for {
  url  <- config[String]("url")
  port <- config[Int]("port")
} yield ConnectionParams(url, port)

You run your program and it says key “url” not found, turns out the key was “endpoint”. So you change your code and re-run. Now it says the “port” key was not a well-formed integer.

It would be nice to have all of these errors be reported simultaneously. That the username can’t have dashes can be validated separately from it not having special characters, as well as from the password needing to have certain requirements. A misspelled (or missing) field in a config can be validated separately from another field not being well-formed.

Enter Validated.

We used a different domain however:

case class City(name: String)
case class Postcode(code: String)
case class Location(name: City, postcode: Postcode)

Then the individual validations are defined:

  def validateCity(city: City): Validated[NonEmptyList[String], City] = city.name match {
    case "Amsterdam" ⇒ valid(City("Amsterdam"))
    case "Utrecht" ⇒ valid(City("Utrecht"))
    case _ ⇒ invalidNel(s"${city.name} is not a valid city")
  }

  val postalCodeRegex = "^([0-9]{4})([A-Z]{2})$".r
  def validatePostalcode(postcode: Postcode): Validated[NonEmptyList[String], Postcode] = 
    postcode.code match {
      case pc if matchesRegex(postalCodeRegex)(pc) ⇒ valid(Postcode(pc))
      case _ ⇒ invalidNel(s"${postcode.code} is not a valid PostalCode")
  }

As you see, the result type of the individual validations is Validated[NEL[String], Something]. If all validations comply, it returns a Valid(Something). If not (so if any of validations fail) it returns a Invalid(NonEmptyList[String]) where the Strings in the NonEmptyList are, in our case, error messages.

If the validation can be represented by a Boolean, as in our validatePostalcode, the validating function can be shortened:

  def validatePostalcode(postcode: Postcode): Validated[NEL[String], Postcode] =
    (valid(postcode).ensure(NEL.of(s"${postcode.code} is not a valid PostalCode"))
                           (pc => matchesRegex(postalCodeRegex)(pc.code)))

As mentioned in the quoted introduction of Validated in the Cats library, we would like to be able to report all validation errors instead of just the first. It can be obtained as follows:

object ParallelValidateTyped extends NamePrintingApp {

  import CatsLocationValidators._

  def parallelValidateTyped(v1: Validated[NEL[String], City],
                            v2: Validated[NEL[String], Postcode]): 
                              Validated[NEL[String], Location] = {
    (v1, v2) match {
      case (Valid(c: City), Valid(pc: Postcode)) => Valid(Location(c, pc))
      case (Valid(_), Invalid(e)) => Invalid(e)
      case (Invalid(e), Valid(_)) => Invalid(e)
      case (Invalid(e1), Invalid(e2)) => Invalid(e1 ++ e2.toList)
    }
  }

  println(parallelValidateTyped(validateCity(City("Amsterdam")), 
    validatePostalcode(Postcode("1234AB"))))
  println(parallelValidateTyped(validateCity(City("Utrecht")), 
    validatePostalcode(Postcode("1234ABC"))))
  println(parallelValidateTyped(validateCity(City("Rotterdam")), 
    validatePostalcode(Postcode("1234AB"))))
  println(parallelValidateTyped(validateCity(City("Rotterdam")), 
    validatePostalcode(Postcode("1234ABC"))))
}

Resulting in:

---- ParallelValidateTyped  ----
Valid(Location(City(Amsterdam),Postcode(1234AB)))
Invalid(NonEmptyList(1234ABC is not a valid PostalCode))
Invalid(NonEmptyList(Rotterdam is not a valid city))
Invalid(NonEmptyList(Rotterdam is not a valid city, 
  1234ABC is not a valid PostalCode))

We got our parallel validation. The way it is set up, it requires to enumerate all possible combinations for each set of validations. A more generic approach is:

object ParallelValidate2 extends NamePrintingApp {

  import CatsLocationValidators._

  def parallelValidate2[E: Semigroup, A, B, C](v1: Validated[E, A],
                                               v2: Validated[E, B])
                                              (f: (A, B) => C): Validated[E, C] = {
    (v1, v2) match {
      case (Valid(a), Valid(b)) => Valid(f(a, b))
      case (Valid(_), Invalid(e)) => Invalid(e)
      case (Invalid(e), Valid(_)) => Invalid(e)
      case (Invalid(e1), Invalid(e2)) => Invalid(Semigroup[E].combine(e1, e2))
    }
  }

  def transform2Location: ((City, Postcode) => Location) = (c, pc) => Location(c, pc)

  println(parallelValidate2(validateCity(City("Amsterdam")), 
                            validatePostalcode(Postcode("1234AB")))(transform2Location))
  println(parallelValidate2(validateCity(City("Utrecht")), 
                            validatePostalcode(Postcode("1234ABC")))(transform2Location))
  println(parallelValidate2(validateCity(City("Rotterdam")), 
                            validatePostalcode(Postcode("1234AB")))(transform2Location))
  println(parallelValidate2(validateCity(City("Rotterdam")), 
                            validatePostalcode(Postcode("1234ABC")))(transform2Location))
}

which results in:

---- ParallelValidate2  ----
Valid(Location(City(Amsterdam),Postcode(1234AB)))
Invalid(NonEmptyList(1234ABC is not a valid PostalCode))
Invalid(NonEmptyList(Rotterdam is not a valid city))
Invalid(NonEmptyList(Rotterdam is not a valid city, 
  1234ABC is not a valid PostalCode))

We replaced all specific types by type parameters. To be able to enumerate all errors, we define our error type E as a Semigroup, which makes it possible to combine() the errors. Moreover, we supplied a function that maps the outcome of the validators to the requested result type. Note that the arity of the latter function is the number of validations that are executed in parallel. The parallelValidateN explodes if we want to do more validations in parallel: for N = 2 we have 4 cases to check, for N = 3 it is 8, and in general it is 2 ** N …

object ParallelValidate3 extends NamePrintingApp {

  import CatsLocationValidators._

  def parallelValidate3[E: Semigroup, A, B, C, D](v1: Validated[E, A],
                                                  v2: Validated[E, B],
                                                  v3: Validated[E, C])
                                                 (f: (A, B, C) => D): Validated[E, D] = {
    (v1, v2, v3) match {
      case (Valid(a), Valid(b), Valid(c)) => Valid(f(a, b, c))
      case (Valid(_), Valid(_), Invalid(e)) => Invalid(e)
      case (Valid(_), Invalid(e), Valid(_)) => Invalid(e)
      case (Invalid(e), Valid(_), Valid(_)) => Invalid(e)
      case (Valid(_), Invalid(e1), Invalid(e2)) => 
        Invalid(Semigroup[E].combine(e1, e2))
      case (Invalid(e1), Valid(_), Invalid(e2)) => Invalid(Semigroup[E].combine(e1, e2))
      case (Invalid(e1), Invalid(e2), Valid(_)) => Invalid(Semigroup[E].combine(e1, e2))
      case (Invalid(e1), Invalid(e2), Invalid(e3)) => 
        Invalid(Semigroup[E].combine(e1, Semigroup[E].combine(e2, e3)))
    }
  }

  def transform3Location: ((City, Postcode, Location) => Location) = 
                                        (c, pc, _) => Location(c, pc)

  println(parallelValidate3(
    validateCity(City("Amsterdam")),
    validatePostalcode(Postcode("1034AB")),
    validatePostalcodeCityCombination(Location(City("Amsterdam"), Postcode("1034AB")))
  )(transform3Location))
  println(parallelValidate3(
    validateCity(City("Amsterdam")),
    validatePostalcode(Postcode("1234AB")),
    validatePostalcodeCityCombination(Location(City("Amsterdam"), Postcode("1234AB")))
  )(transform3Location))
  println(parallelValidate3(
    validateCity(City("Utrecht")),
    validatePostalcode(Postcode("3534ABC")),
    validatePostalcodeCityCombination(Location(City("Utrecht"), Postcode("3534ABC")))
  )(transform3Location))
  println(parallelValidate3(
    validateCity(City("Rotterdam")),
    validatePostalcode(Postcode("1234ABC")),
    validatePostalcodeCityCombination(Location(City("Rotterdam"), Postcode("1234ABC")))
  )(transform3Location))
}

Here comes the Cartesian syntax of the Cats library to the rescue:

object ParallelValidateCartesian2 extends NamePrintingApp {

  import CatsLocationValidators._
  import cats.syntax.cartesian._

  def transform2Location: ((City, Postcode) => Location) = (c, pc) => Location(c, pc)

  def parallelValidateCartesian2(location: Location): Validated[NEL[String], Location] = {
    validateCity(location.city) |@| validatePostalcode(location.postcode) map transform2Location
  }

  println(parallelValidateCartesian2(Location(City("Amsterdam"), Postcode("1234AB"))))
  println(parallelValidateCartesian2(Location(City("Utrecht"), Postcode("1234ABC"))))
  println(parallelValidateCartesian2(Location(City("Rotterdam"), Postcode("1234AB"))))
  println(parallelValidateCartesian2(Location(City("Rotterdam"), Postcode("1234ABC"))))
}

which results in:

---- ParallelValidateCartesian2  ----
Valid(Location(City(Amsterdam),Postcode(1234AB)))
Invalid(NonEmptyList(1234ABC is not a valid PostalCode))
Invalid(NonEmptyList(Rotterdam is not a valid city))
Invalid(NonEmptyList(Rotterdam is not a valid city, 1234ABC is not a valid PostalCode))

The Cartesian product operator basically allows to combine stuff, which can be illustrated by the Cartesian product of two lists (from herding cats – Cartesian):

scala> (List("ha", "heh", "hmm") |@| List("?", "!", ".")) map {_ + _}
res12: List[String] = List(ha?, ha!, ha., heh?, heh!, heh., hmm?, hmm!, hmm.)

The cartesian product can be extended to more then two validators, without the explosion of combinations:

object ParallelValidateCartesian3 extends NamePrintingApp {

  import CatsLocationValidators._
  import cats.syntax.cartesian._

  def transform3Location: ((City, Postcode, Location) => Location) = (_, _, l) => l

  def parallelValidateCartesian3(location: Location): ValidatedNel[String, Location] = {
    validateCity(location.city) |@|
      validatePostalcode(location.postcode) |@|
      validatePostalcodeCityCombination(location) map transform3Location
  }

  println(parallelValidateCartesian3(Location(City("Amsterdam"), Postcode("1034AB"))))
  println(parallelValidateCartesian3(Location(City("Utrecht"), Postcode("1234AB"))))
  println(parallelValidateCartesian3(Location(City("Rotterdam"), Postcode("1234AB"))))
  println(parallelValidateCartesian3(Location(City("Rotterdam"), Postcode("1234ABC"))))
}

which results in:

---- ParallelValidateCartesian3  ----
Valid(Location(City(Amsterdam),Postcode(1034AB)))
Invalid(NonEmptyList(1234AB is not a valid postal code for Utrecht))
Invalid(NonEmptyList(Rotterdam is not a valid city, 1234AB is not a valid postal code for Rotterdam))
Invalid(NonEmptyList(Rotterdam is not a valid city, 1234ABC is not a valid PostalCode, 1234ABC is not a valid postal code for Rotterdam))

Admitted, the validations are not very realistic, but they do illustrate the way Validated works in combination with Cartesian, and that's what is intended. And if you want to report all errors in a certain validation, Validated combined with the CartesianBuilder syntax are certainly a way to consider.