Monday, October 19, 2015

Harness Scala Type Classes and Implicits

In my previous blog, I presented an Expression ADT. In this article I will extend its functionality and serialize it using other ADT while maintaining decoupling using Scala's magic a.k.a implicits and type classes . Full code is in this repository.
Let's start by building our JSON serializer ADT
sealed trait JSON
case class JSeq(elms:List[JSON]) extends JSON
case class JObj(bindings:Map[String,JSON]) extends JSON
case class JNum(num:Double) extends JSON
case class JStr(str:String) extends JSON
case class JBool(b:Boolean) extends JSON
case object JNull extends JSON
and now we can create our JSONWriter to convert JSON objects to nice JSON String
object JSONWriter{
  def write(j:JSON):String = {
    j match {
      case JSeq(e) => e.map(write).mkString("[",",","]")
      case JObj(obj) => 
           obj.map(o=> "\""+o._1+"\":"+write(o._2)).mkString("{",",","}")
      case JNum(n) => n.toString
      case JStr(s) => "\""+s+"\""
      case JBool(b) => b.toString
      case JNull => "null"

    }
  }
In order to make this JSONwriter work we need to add functionality to our Expression ADT that will handle the conversion to JSON object and then send it to the JSONWriter.write. Something like this
sealed trait AST{
def asJSON:JSON
}
and then implement asJSON in every subclass e.g
sealed trait BooleanExpression extends AST
case class BooleanOperation(op: String, lhs: BooleanExpression, rhs: BooleanExpression) extends BooleanExpression{

override def asJSON:JSON =  JObj(
        Map(
        "Operation"-> JStr(op),
        "lhs" -> toJSON(lhs),
        "rhs" -> toJSON(rhs)
        ))
}

// now we can call JSONWriter.write(BooleanOperation(..,..,..).asJSON)
That will do it right ? Wrong ! we want to maintain decoupling as much as possible. Our AST shouldn't care about the JSONWriter. we do not want it polluting the whole namespace.

Let's Harness Type Classes.

Type class defines the behavior in form of operation that must be supported by type T and allows ad-hoc polymorphism.
trait JSONSerializer[T] {
  def toJSON(value:T):JSON
}
And now we can add some functionality to use this trait to our JSONWriter object
...
 def write[A](value:A, j:JSONSerializer[A]):String = write(j.toJSON(value))
and use it like this
val astExpressionToJSON = new JSONSerializer[AST] {

      override def toJSON(value: AST):JSON = value match{

        case BooleanOperation(op, lhs, rhs) => JObj(
        Map(
        "Operation"-> JStr(op),
        "lhs" -> toJSON(lhs),
        "rhs" -> toJSON(rhs)
        ))
        case LRComparison ....

JSONWriter.write(BooleanOperation(..,..,..),astExpressionToJSON)
Well... that will do it, but you might ask yourself so where is the Scala magic you talked about ? We want to maintain our code as clean and in style as much we can. Implicits can be very helpful in this case. We just need to change the write method in the JSONWriter object

Implicits - the hidden gem

Just as a quick brush up, and I'm stepping aside her we can define an implicit parameter and tell the compiler to search for it somewhere else in scope for example.
implicit val i = 1 
//somewhere else in the code we can define a method that will use this Int 

def increment(x:Int)(implicit y:Int) = x+y
As long that the implicit definition is in the scope we can use it without the second parameter
scala> implicit val i = 1
i: Int = 1

scala> def increment(x:Int)(implicit y:Int) = x + y
increment: (x: Int)(implicit y: Int)Int

scala> increment(9)
res0: Int = 10
Warning! Do not abuse Implicits . Implicits, if not used wisely can seriously damage your code readability. Sometimes, you can simply add a default value.
It always depends how and what do you want to express .
Saying that, let's get back on track.
...
def write[A](value:A)(implicit j:JSONSerializer[A]):String = write(j.toJSON(value))
...
and that's it (well almost, more cool stuff ahead) . with this defined we need to create our implicit definition
   implicit val astExpressionToJSON = new JSONSerializer[AST] {

   override def toJSON(value: AST):JSON = value match{
   case BooleanOperation(op, lhs, rhs) => JObj(
       Map(
       "Operation"-> JStr(op),
       "lhs" -> toJSON(lhs),
       "rhs" -> toJSON(rhs)
       ))
   case  LRComparison (lhs: Variable, op, rhs: Constant) =>JObj( ....
and we are good to go, nice and clean.
val parser = new ConditionParser {}
val p:AST = parser.parse("foo = '2015-03-25' || bar = 8").get //I know that the "get" will succeed here
val s = JSONWriter.write(p)
println(s)
//will print {"Operation":"||","lhs":{"Operation":"=","lhs":"foo","rhs":"1422136980000"},"rhs":{"Operation":"=","lhs":"bar","rhs":8.0}}

Context Bound

Another way to achieve the same with a stylish way to use implicitly pulls the implicit from the context is using context bound by stating that A is a member of the type class hence it must have the functionality that we want.
def toJSONString[A:JSONSerializer](value:A):String = {
    val j = implicitly[JSONSerializer[A]]//pull the implicit def of JSONSerializer
    write(j.toJSON(value))
    //we can replace the upper two lines with:  write(implicitly[JSONSerializer[A]].toJSON(value))
  }
and use it like this
val parser = new ConditionParser {}
val p:AST = parser.parse("foo = '2015-03-25' || bar = 8").get //I know that the get will succeed here
val s1 = JSONWriter.toJSONString(p)
println(s1)
//will print {"Operation":"||","lhs":{"Operation":"=","lhs":"foo","rhs":"1422136980000"},"rhs":{"Operation":"=","lhs":"bar","rhs":8.0}}
we can use it everywhere (YEY!!!) lets say that we have a 3rd party class e.g Person class we can add this functionality without changing the 3rd party class.
case class Person (name:String,age:Int)

object PersonImplicits{
  implicit val personToJSON = {
    new JSONSerializer[Person] {
       override def toJSON(value: Person): JSON =JObj( Map("name" -> JStr(value.name),
       "age" -> JNum(value.age)))
    }
  }
}

// and just bring that into scope 

    import PersonImplicits._
    val person = Person("Drakula",578)
    println(JSONWriter.write(person))
//will print {"name":"Drakula","age":578.0}

Implicit classes

another cool use of implicits is to add functionality to external class using implicit class
object JSONConverterImplicits {
  implicit class ASTConverter(ast:AST){
    def asJSON: JSON = ast match{

      case BooleanOperation(op, lhs, rhs) => JObj(
        Map(
          "Operation"-> JStr(op),
          "lhs" -> lhs.asJSON,
          "rhs" -> rhs.asJSON
        ))
      case  LRComparison (lhs: Variable, op, rhs: Constant) =>JObj( ...
and now we can use it as is "asJSON" is part of the AST
    import JSONConverterImplicits._
    val p2 = parser.parse("foo = '2015-03-25' || bar = 8 ").get
    val s2 = JSONWriter.write(p2.asJSON)
how cool is that ?!

Summary 

Type classes and implicits are very powerful tools, especially when writing libraries that needs to be extended, and maintaining decoupled. We can add default implementations in a nice and natural way for the developers that uses our library. Those implementations can easily overwrite and extended.

hope you enjoyed it.
  • Comments and feedbacks are always welcome

Sunday, October 11, 2015

Filtering using Scala Parser Combinators

Filtering using Scala Parser Combinators


So I got interesting task to enable record filtering according to some rules that we get from configuration represented as text.
well so far we know two things :
1. we need a filter that accepts a record as an input and might return a record as its output :
def filter(record:Record ):Option[Record]
2. we need to get a text with some kind of rule e.g "foo = 'FOO'" , "bar = 8", "buz IS NOT NULL " and AND / OR combinations . where foo/bar/buz are record's attributes
*well I know that "null" is a dirty word in FP but that was the requirement since it was a mix-in with java project.
Following the rule of three so is our todo list :
  1. parse the text
  2. evaluate it's Boolean expression in a form of Abstract Syntax Tree.
  3. filter according the Boolean evaluation.
The complete Source Code can be found here (you are more than welcome to contribute ) Parser Combinators to the rescue
Inspired by this excellent post an introduction to scala parser combinators and by this stackoverflow answer .
let's start writing the preserved words that we will use :
object FilterConstants{
  val And = "AND"
  val OR = "OR"
  val UnaryAnd = "&&"
  val UnaryOr = "||"
  val Equals = "="
  val NotEquals = "!="
  val Greater = ">"
  val Smaller = "<"
  val GreaterOrEqual = ">="
  val SmallerOrEqual = "<="
  val IsNull = "IS NULL"
  val IsNotNull = "IS NOT NULL"
}
and add our AST objects
next let's write the parser it self for that we will need extend JavaTokenParsers and PackratParsers . since JavaTokenParsers enables us to capture text using regex we need to do a little override to inherited regex of the string literal in order to capture single quote text .
object ASTObjects {
  sealed abstract class AST

  sealed abstract class BooleanExpression extends AST

  sealed abstract class Constant extends AST

  case class BooleanOperation(op: String, lhs: BooleanExpression, rhs: BooleanExpression) extends BooleanExpression

  case class Comparison[T <: Constant](lhs: Constant,op: String) extends BooleanExpression

  case class LRComparison (lhs: Variable, op: String, rhs: Constant) extends BooleanExpression

  case class NumericConstant(value: Double) extends Constant

  case object NullConstant extends Constant

  case class DateConstant(value: Date) extends Constant

  case class TimestampConstant(value: Timestamp) extends Constant

  case class StringConstant(value: String) extends Constant

  case object EmptyConstant extends Constant

  case class Variable(name: String) extends Constant {
    def eval(env: Map[String, Constant]) = env(name)

    override def toString = name
  }
}

The Parser

trait ConditionParser extends JavaTokenParsers with PackratParsers {
  import FilterConstants._
  override def stringLiteral: Parser[String] =  """(["'])[^"']*\1""".r
  var  simpleDateFormat:SimpleDateFormat = new SimpleDateFormat("yyyy-mm-dd")

  val d = """'+\d{4}-\d{2}-\d{2}+'""".r // "'2015-03-25'".matches("""'+\d{4}-\d{2}-\d{2}+'""")
  def dateLiteral: Parser[String] = d
  val ts = """'+\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(\.\d)?+'""".r
  def dateTimeLiteral: Parser[String] = ts

  val booleanOperator : PackratParser[String] = literal(UnaryOr) | literal(UnaryAnd) | literal(And) | literal(OR)
  val comparisonOperator : PackratParser[String] = literal(SmallerOrEqual) | literal(GreaterOrEqual) | literal(Equals) | literal(NotEquals) | literal(Smaller) | literal(Greater)
  val nullCompare : PackratParser[String] = literal(IsNotNull) | literal(IsNull)
  val constant : PackratParser[Constant] = floatingPointNumber ^^ { x => NumericConstant(x.toDouble) } |
    stringLiteral ^^ {s => StringConstant(s.replaceAll("'","")) } |
    dateLiteral ^^ {d => DateConstant(new Date(simpleDateFormat.parse(d).getTime))}  |
    dateTimeLiteral ^^ {d => DateConstant(new Date(simpleDateFormat.parse(d).getTime))}  |
    stringLiteral ^^ {s => StringConstant(s.replaceAll("'","")) }
  val variable = ident ^^ {s=>Variable(s)}

  val comparison : PackratParser[BooleanExpression] = (variable ~ comparisonOperator ~ constant) ^^ { case lhs ~ op ~ rhs => LRComparison(lhs, op, rhs) } |
    (variable ~ nullCompare) ^^ {case lhs ~ nc => Comparison(lhs,nc)}

  lazy val p1 : PackratParser[BooleanExpression] = booleanOperation | comparison
  val booleanOperation : PackratParser[BooleanExpression]= (p1 ~ booleanOperator ~ p1) ^^ { case lhs ~ op ~ rhs => BooleanOperation(op, lhs, rhs) }|
    "(" ~> booleanOperation <~ ")"

  def parse(text:String) :Option[BooleanExpression] = {
    val res = parseAll(p1, text)
    res match {
      case Success(r, n) => Some(r)
      case Failure(msg, n) =>
        println("Failure parsing "+ text+"\nMessage: " +msg)
        None
      case Error(msg, n) =>
        println("Error"+msg)
        None
    }
  }
}
I do not want to get into those funny symbols but just a short intro to symbols I used above:
~ match a successful parser following other successful parser
| match a parser OR other parser
^^ that if parser succeeds it's content will be used in the following function
<~ if the parser on the left succeeds it is ignored and continues with right side
we can write some tests :
class ConditionParserTest extends FlatSpec with ShouldMatchers with ConditionParser{
  "Parser" should "Parse Null comparison" in {

    parse("foo IS NULL") should equal (Some(Comparison(Variable("foo"),IsNull)))
    parse("foo IS NOT NULL") should equal (Some(Comparison(Variable("foo"),IsNotNull)))
  }
  it should " parse left right comparison numeric values" in {
    parse("foo = 8") shouldEqual Some(LRComparison(Variable("foo"),Equals,NumericConstant(8D)))
    parse("foo = 8.5") shouldEqual Some(LRComparison(Variable("foo"),Equals,NumericConstant(8.5)))
    parse("foo = -8")shouldEqual Some(LRComparison(Variable("foo"),Equals,NumericConstant(-8D)))
  }
  it should " parse left right comparison String values" in {
    parse("foo = 'BAR'") shouldEqual Some(LRComparison(Variable("foo"),Equals,StringConstant("BAR")))
    parse("foo != 'BAR'") shouldEqual Some(LRComparison(Variable("foo"),NotEquals,StringConstant("BAR")))
  }
}
so this concludes the Parsing part

Evaluating the Boolean expressions

trait Evaluator{
  import FilterConstants._
  def evaluate(expression:BooleanExpression)(implicit env: Map[String, Constant]) : Boolean = expression match {

    case LRComparison(v:Variable, Equals, c) =>  (v.eval(env),c) match{
      case (StringConstant(l),StringConstant(r)) => l == r
      case (NumericConstant(l),NumericConstant(r)) => l == r
      case (StringConstant(l),NumericConstant(r)) => r == l.toDouble
      case (a,b) =>
        println(s"FAILED to evaluate '==' got $a not equal to $b" )
        false
    }
    case LRComparison(v:Variable, NotEquals, c) =>  (v.eval(env),c) match{
      case (StringConstant(l),StringConstant(r)) => l != r
      case (NumericConstant(l),NumericConstant(r)) => l != r
      case (StringConstant(l),NumericConstant(r)) =>  r != l.toDouble
      case (a,b) =>
        println(s"FAILED to evaluate != got $a not equal to $b" )
        false
    }
    case LRComparison(v:Variable, Greater, c) =>  (v.eval(env),c) match{
      case (NumericConstant(l),NumericConstant(r)) => l > r
      case (StringConstant(l),NumericConstant(r)) => l.toDouble > r
      case (a,b) =>
        println(s"FAILED to evaluate $Greater got $a not equal to $b" )
        false
    }
    case LRComparison(v:Variable, Smaller, c) =>  (v.eval(env),c) match{
      case (NumericConstant(l),NumericConstant(r)) => l < r
      case (StringConstant(l),NumericConstant(r)) => l.toDouble < r
      case (a,b) =>
        println(s"FAILED to evaluate $Smaller got $a not equal to $b" )
        false
    }
    case LRComparison(v:Variable, SmallerOrEqual, c) =>  (v.eval(env),c) match{
      case (NumericConstant(l),NumericConstant(r)) => l <= r
      case (StringConstant(l),NumericConstant(r)) => l.toDouble <= r
      case (a,b) =>
        println(s"FAILED to evaluate $SmallerOrEqual got $a not equal to $b" )
        false
    }
    case LRComparison(v:Variable, GreaterOrEqual, c) =>  (v.eval(env),c) match{
      case (NumericConstant(l),NumericConstant(r)) =>
        l >= r
      case (StringConstant(l),NumericConstant(r)) =>
        l.toDouble >= r
      case (a,b) =>
        println(s"FAILED to evaluate $GreaterOrEqual got $a not equal to $b" )
        false
    }
    case Comparison(v:Variable,IsNotNull) => v.eval(env) match{
      case EmptyConstant => false
      case _ => true
    }
    case Comparison(v:Variable,IsNull) => v.eval(env) match{
      case EmptyConstant => true
      case _ => false
    }

    case BooleanOperation(UnaryOr, a, b) => evaluate(a) || evaluate(b)
    case BooleanOperation(OR, a, b) => evaluate(a) || evaluate(b)
    case BooleanOperation(UnaryAnd, a, b) => evaluate(a) && evaluate(b)
    case BooleanOperation(And, a, b) => evaluate(a) && evaluate(b)
  }
}

Finally our Filter

class ConditionalFilter(ruleText:String) extends ConditionParser with Evaluator{

  val rule = parse(ruleText)
  val fields = rule match {
    case Some(r) => extractFields(r)
    case _ => Nil
  }

  def extractFields(b:BooleanExpression):List[String] = b match {
    case BooleanOperation(_,l,r) => extractFields(l):::extractFields(r)
    case LRComparison(l,_,_) => List(l.toString)
    case Comparison(l,_) => List(l.toString)
    case _ => Nil
  }

  def getMap(record:Record,fields:List[String]):Map[String, Constant] = {
    fields.foldLeft(Map.empty[String, Constant]){(m,f) =>
      m + (f -> {
        Option(record.getFieldValue(f)) match {
          case Some(v) =>
            StringConstant (v.toString)
          case _ => EmptyConstant
        }
      })
    }
  }

  def filter(record:Record ):Option[Record] = {
    implicit val m = getMap(record,fields)
    if (rule.exists(evaluate)){
      Some(record)
    }
    else
      None
  }

}
Now it is all complete and we can

Test it

class ConditionalFilterTest extends  FlatSpec  with ShouldMatchers{

  def getRecord(fields:Map[String,String]) = Record(fields)
  def getFilter(expr:String ) = new ConditionalFilter(expr)

  "Filter by col_name with rule " should "IS NULL include records with null values " in{
    val f = getFilter("foo IS NULL" )
    val f2 = getFilter("foo IS NOT NULL" )
    val r = getRecord(Map("foo"-> null,"bar"-> "5.2"))
    f.filter(r) should equal( Some(r))
    f2.filter(r) should equal( None)

  }
  it should "IS NOT NULL remove records with null values " in{
    val f = getFilter("foo IS NOT NULL" )
    val f2 = getFilter("foo IS NULL" )
    val r = getRecord(Map("foo"-> "foo","bar"-> "5.2"))
    f.filter(r) should equal( Some(r))
    f2.filter(r) should equal( None)

  }
  it should "= output records that equal the value" in {
    val f = getFilter("foo = 'FOO'" )
    val f2 = getFilter("foo = 'FOOx'" )
    val r = getRecord(Map("foo"-> "FOO","bar"-> "5.2"))
    f.filter(r) should equal( Some(r))
    f2.filter(r) should equal( None)
  }
}
more tests and updates can be found in the source code here .
In my next post we I will extend the Expression ADT using Type classes and Implicits .

contributes/feedbacks/issues are welcome 
Cheers 
Avi