Thursday, December 20, 2012
Sunday, December 16, 2012
My take on monads
Rubik's Cube, color scheme modified, with shadow and reflection (Photo credit: Wikipedia) |
Learning this stuff was kinda hard for me because I've never encountered category theory so the concepts seemed strange to me. Now I'll try to share my intuition and hopefully help somebody. Pro tip: read many different "monad tutorials". Number of these articles is apparently exponential function so you won't have trouble finding them. I'll try to deliver a more pragmatic approach so instead of formally correct definitions I'll do some hand-waving to produce something more useful.
Types(and kinds)
Instead of talking about categories and morphisms let's focus on types. That's what scala is about right?
We are talking about general concepts and transformations(functions) here so we'll need generics. The other point is we want to have different semantic for different occasions. The most obvious way to do that is to write different generic wrappers. More formally: type constructors. Stuff like Option[A], List[A], Future[A], Box[A]... These are not usable types yet(roughly speaking) as scala doesn't support raw types like java. You need to provide type parameters to obtain types like Option[Int], List[Option[String]]... We say that such types are higher kinded types. See a kind is to type what a type is to value. Remember that, it's a great analogy(from Daniel Spiewak). So Option, List and other types we are currently interested in are of kind * -> *. This reads "type to type". It takes a type parameter, say Int, and produces a type, in this case Option[Int].It should slowly become clear that we can write a higher kinded type to add custom semantics to any raw type.
Functors
Function maps from value a to value b. A functor is a higher kinded type that allows to lift functions and operate on boxed values. This means function does not need to be aware of any additional semantics you put in place. In other words, a functor is something that has a proper map function.
trait Functor[F[_],+A]{ def map[B](f: A=>B): F[B] }
So the functor "knows" about the type of box(F[_] is scala syntax for * -> *) and takes type parameter. Map must then be implemented to lift a function into this context. A sample usage. Re-implementing Option type
Most know application of map is probably on List. It transforms the List[A] to List[B] by applying the function to every element.
Code above is in fact an endofunctor. Endofunctor is something that maps from category A to category A. A in this case is the category of types. There are links at the bottom for more explanation.
sealed trait Opt[+A] extends Functor[Opt, A] case class Som[A](value: A) extends Opt[A]{ def map[B](f: A=>B) = Som(f(value)) } case object Non extends Opt[Nothing]{ def map[B](f: Nothing=>B) = Non }Option adds semantics of missing value. Box may be full or empty. And computations must succeed in any case. Operations on empty option(Non) just return an empty box while if there is a value you apply the function to the value and box it up.
Most know application of map is probably on List. It transforms the List[A] to List[B] by applying the function to every element.
Code above is in fact an endofunctor. Endofunctor is something that maps from category A to category A. A in this case is the category of types. There are links at the bottom for more explanation.
Typeclasses
This comes from functional languages(I've learned it from Haskell first). It's a simple concept in it's heart. Ad-hoc polymorphism. Add functionality and semantics to existing types without changing them. You do that by creating a type class and instances for types you want to "extend" the type class. First of all type class has nothing in common with classes. I'll rewrite the Opt type sample with type classes to make an example.
trait Functor[F[_]]{ def map[A,B](fa: F[A])(f: A=>B): F[B] } sealed trait Opt[+A] case class Som[A](value: A) extends Opt[A] case object Non extends Opt[Nothing] implicit object OptFunctor extends Functor[Opt]{ def map[A,B](opt: Opt[A])(f: A=>B) = opt match { case Non => Non case Som(a) => Som(f(a)) } }
you pull the Functor instance from the implicit scope implicitly[Functor[Opt]] or do view bounds
def functorId[F[_] : Functor](fa: F[_]) = fahere functorId is identity function that doesn't compile if functor instance doesn't exist for it's argument.
Monoids
I've included type classes because monoids are much easier expressed through them. Object oriented way gets messy. You can try. Or better - prove me wrong, I'll be delighted.
So a monoid is something that can be added togheter or combined in a way. That's it. Oh and it needs an identity element. Let's express that as a type class
trait Monoid[M]{ def id: M def dot(a: M, b: M): M }
and an instance for Int with example usage
implicit object IntSumMonoid extends Monoid[Int]{ val id = 0 def dot(a: Int, b: Int) = a+b } def combine[A](xs: List[A])(implicit monoid: Monoid[A]) = xs.fold(monoid.id)( (a,b) => monoid.dot(a,b) ) // combine(List(1,2,3)) == 6
You could easily do multiplication instead by setting id to 1 and replacing + with *.
Monads
Let's stay with options and lists for a bit. What if you have a function f that returns an option(f: A=>Option[B])? If you map it over an option you get Option[Option[B]] or even more levels.
You can deal with this by pattern matching...
def twoLevels[A,B](o: Opt[A])(f: A=>Opt[B]) = o match { case Non => Non case Some(value) => f(value) }
and this produces the desired result. Same goes for list. You get into situations where you want List[A] but you got List[List[A]]. You can solve this by flattening the list(List has method flatten but let's demonstrate it here)
Another good thing about flatMap is that it can express map and filter too. And that's the reason for expressions from scala can be called monadic comprehensions.
def flatten[A](xss: List[List[A]]): List[A] = xss match { case Nil => Nil case ys::yss => ys:::flatten(yss) }
Again pattern matching. What do these two got in common? You do something depending on the current value, execute the transformation(or not) and do something with the output. Looks very similar to the map function. And it is. On lists it's just map and then flatten. Literally, you can write this with scala collections. But you can also do it in a single function. Scala calls it flatMap because of what it does. In haskell world it's bind, we'll see why in a minute. Here's the type signature
trait Monad[M[_]]{ def flatMap[A,B](ma: M[A])(f: A => M[B]): M[B] }and an instance for our options. Note that Opt class did not need to be changed.
implicit object OptMonad extends Monad[Opt]{ def flatMap[A,B](ma: Opt[A])(f: A => Opt[B]): Opt[B] = ma match { case Non => Non case Som(a) => f(a) } }So why is this useful? If you have instances for collections(like in scalaz, standard collections do this in OO way) you can abstract away pattern matching and have very readable code. If you want do chain failable computations you just wrap them up in options and do a for expression. Like this
val result = for{ a <- Option(foo()) //foo can return null b <- Option( bar(a) ) //bar can null too c <- baz(b) //baz returns an option } yield computation(c) // 'computation' never failsWhat's the type of result? Option of whatever computation returns. All failures are abstracted away.
Another good thing about flatMap is that it can express map and filter too. And that's the reason for expressions from scala can be called monadic comprehensions.
Semantics
What's so cool about sytactic sugar for pattern matching that it deserves such a special place? Monads are much more. Inside the for comprehension you have a custom programming language defined by the monad - the implementation of flatMap for given type. See, as long as it obeys some laws, the binding function can do pretty much anything.
Let's take a look at IO monad. There's nothing scary here, you have 'boxes' describing IO actions. Let's refer to these as contexts. And you can bind them together. If I have a ReadLine action and bind PrintLine action over it I'll get a new action that will echo a line of text. Code would look something like this
for{ input <- ReadLine() } yield PrintLine(input)
Note the initials. ReadLine and PrintLine are case class constructors. This expression just returns an IO action it doesn't perform it - this is up to the caller. And so this expression is referentially transparent(always returns the same output for same input and no side effects). How does it work? Scala compiler actually desugars it into
ReadLine().flatMap( input => PrintLine(input) )And flatMap is inplemented to join the actions together. See this is a great way to write functional programs that look imperative. In fact it's the only way to do IO in haskell. Scala is more forgiving in this way.
What about futures? Future is a box that will hold a value sometime in the future. Scala 2.10 has them, Akka had them before and I think they're in scalaz too. Heck even java standard library has them. Java one is not a monad though. Fun thing about monadic futures is that you can compose them.
for{ a <- futureA _ <- futureB c <- futureC } yield a+cThis will yield a new future. It will wait on futureA to complete, take it's value, wait for futureB(for side effects probably) and then wait on c for its value and on completion it will hold the sum. Different futures can be executed in parallel, even on different machines yet you can compose them and work with them in a sequential - very imperative - way. This works especially well with non-blocking IO since monadic comprehensions don't block(if flatMap has a sane implementation).
Monoid in category of endofunctors
The joke from the beginning of course. Stop reading for a moment and think about it. It should make sense. If it doesn't - read ahead.
Monad binds(combines) functions over a value in a context. And it adds some semantics while combining them. I tried writing some code
object OptTransMonoid { def id[A] = (a:A) => Som(a) def dot[A,B,C](f: A=>Opt[B], g: B=>Opt[C]) = (a:A) => { f(a) match { case Non => Non case Som(a) => g(a) } } }You can see that I didn't extend the monoid trait. I just couldn't come up with a proper type parameter. Problem is that the two transformations we are trying to compose are of different types. So you need a generic method to remain typesafe. And a generic method cannot override a non-generic one on JVM. No matter how hard you try with type parameters. But the principle holds. A monad over M is a monoid over _=>M[_] (this is hand waving and not formally correct. _ stands for "some type" not Any).
More material
Scala Typeclassopedia and the talk
This concludes my brief overview. I hope you grew to like functional stuff. I've written a small IO monad implementation and usage demonstration below to give you a feel for real life usage.IO Monad sample
trait IOAction[A]{ outer => def run: A def map[B](f: A=>B) = new IOAction[B]{ def run = f(outer.run) } def flatMap[B](f: A=>IOAction[B]) = new IOAction[B]{ def run = f(outer.run).run } } case object ReadLine extends IOAction[String]{ def run = readLine() } case class PrintLine[A](s: A) extends IOAction[Unit]{ def run = println(s.toString) } val greeter = ReadLine.flatMap(name => PrintLine("hello " + name)) greeter.run //reads a line and greets you //with sugar(and nice to user) val niceGreeter = for{ _ <- PrintLine("who are you?") name <- ReadLine _ <- PrintLine("hello " + name) } yield () niceGreeter.runJust a quick note. Monadic comprehension is this way because desugaring of <- is flatMap except for the last one where it's map. So to work around this you need to yield () instead of yielding the println action as this will map it instead of flatMap.
Related articles
Monday, December 10, 2012
Cool Monday: Exploration of dynamic db acces from scala
I use scala on Android and I don't like the integrated database API. It's very verbose and very stateful. I had written my own ORM(DAO would be a more appropriate tag) a while back, before I used scala but it's not enough anymore. So now I'm on a quest for a better database API. My dream is something small that handles schema for me and is type-safe. A nice DSL that is converted to SQL at compile time and does code generation. So it's fast like hand writing everything. But reduces code footprint by an order of magnitude(at least). Scala SLICK looks promising. It fits most requirements. But it's kinda big for android projects(you need scala library too!) and has not yet hit a stable version so I wouldn't be comfortable shipping it. Will definitely give it a thorough test when scala 2.10 is stable and SLICK is released. Oh, and it needs a third party JDBC driver for Android. This is another level of abstraction and therefore another source of slowness. I contemplated writing my own clone targeted at Android but never came around to actually doing it(yet!). It seems like a herculean task for single developer working in spare time.
Meanwhile
Yesterday I stared thinking how dynamic languages handle databases. And I got an idea. Scala has type Dynamic that does compilation magic to provide syntactic sugar for working with dynamic languages or objects. Here's an idea: do queries in plain SQL and perform extraction of data in a dynamic way.
And how to do this? Just wrap up Cursor to provide necessary methods.
class WrappedCursor(cursor: Cursor) implements Cursor{ //delegated methods go here }Why I need this? Cake pattern of course, Dynamic cursor get's mixed in.
trait DynamicCursor extends Dynamic{ this: Cursor => def selectDynamic(name: String) = getColumn(getColumnIndex(name)) def getColumn(index: Int) = getType(index) match { case Cursor.FIELD_TYPE_BLOB => getBlob(index) case Cursor.FIELD_TYPE_FLOAT => getDouble(index) case Cursor.FIELD_TYPE_INTEGER => getLong(index) case Cursor.FIELD_TYPE_NULL => null case Cursor.FIELD_TYPE_STRING => getString(index) } def toSeq = (0 until getColumnCount) map getColumn }I targeted API level 14(Ice Cream Sandwich) since getType(method on Cursor) is available from 11 on. Key method here is getColumn that abstracts over types. So you can read a column and do pattern matching on it. Or you are evil and use implicit conversions from Any to String, Long etc... Or use implicit conversion to "converter"
implicit class Converter(val value: Any) extends AnyVal{ def blob = value.asInstanceOf[Array[Byte]] def double = value.asInstanceOf[Double] def long = value.asInstanceOf[Long] def string = value.asInstanceOf[String] }But the real deal is selectDynamic. This allows you to write code like this
val c = new WrappedCursor(result) with DynamicCursor c.someColumn.long
This compiles down to selectDynamic("someColumn") that calls getColumn and finally implicit conversion is inserted that allows for terse cast to Long.
And I threw in a conversion from row to Seq that does a snapshot of current row. This allows pattern matching on rows. Any you can now construct a Stream that will handle Cursor state and lazily evaluate and store these snapshots. Therefore you can abstract away all mutability and handle cursor as immutable collection.
Said conversion to stream
And I threw in a conversion from row to Seq that does a snapshot of current row. This allows pattern matching on rows. Any you can now construct a Stream that will handle Cursor state and lazily evaluate and store these snapshots. Therefore you can abstract away all mutability and handle cursor as immutable collection.
Said conversion to stream
def CursorStream(cursor: DynamicCursorRaw with Cursor) = { def loop(): Stream[Seq[Any]] = { if(cursor.isAfterLast) Stream.empty[Seq[Any]] else { val snapshot = cursor.toSeq cursor.moveToNext() snapshot #:: loop() } } cursor.moveToFirst() loop() }And some more implicits to help
implicit class RichCursorRaw(cursor: Cursor) extends AnyVal{ def dynamicRaw = new WrappedCursor(cursor) with DynamicCursorRaw def toStream = CursorStream(dynamicRaw) }All the source is in the project on github https://github.com/edofic/dynamic-db-android (work in progress).
Related articles
Labels:
android,
Programming,
Scala,
SQL
Tuesday, December 4, 2012
Homework - functional style (outer sorting)
I'm attending Algorithms and data structures class this semester. Material it self is quite interesting and one TA is pretty cool too. But I don't like professor(makes whole experience very much worse) and I believe homeworks could be much better. Oh, and we didn't even mention functional approach...you know Haskell, Scala and the like. All we do is imperative, C-style code in Java. Enough ranting. This is how it saw the bright side.
Quite simple right? TA's even provided us classes(talking Java here) InTrack and OutTrack to manage tracks and I/O. Well my problem is that I grew to dislike imperative style. Surely it may matter for performance, but since this is a homework, performance didn't matter - so I wrote pretty code. I wanted my central code(the heart of the algorithm) to be a few lines at most.
Now to comment on this. I have a source that's agnostic to the amount of open files. And I have a similar sink. End-point switching is implemented in MultiSink and element choosing is in MultiSource. Both InTrack and MultiSource implement Iterable(and Iterator) so I can use them in a for-each loop. And code is as pretty as I can get it(remaining in java). All in all ~300 lines(with InTrack & stuff). After removing uneeded utility methods and comments ~220 lines. Eww.. Thats way too much.
Writer was a bit trickier. It needs internal state, but I prohibited mutable state. Solution is to return a new Writer containing a new state every time you write. Then the user must just be careful not to use stale Writers - not that big a deal.
This is the Writer trait
Putting it all together.
I find this solution to be MUCH MORE elegant. Not to mention it's just 65 lines of scala. But it makes me really sad they don't even mention functional programming at algorithms course. I'm probably gonna pay the professor and TA's a visit in near future.
Le problem
We were doing outer sorting. More specifically: balanced natural outer merge sort. I hope I translated this right(probably not). In it's essence the algorithm looks like this- you have multiple tracks you read from and write to
- you have the current element of each track in memory
- you write out squads(non-descending sub-sequence), this means you take the minimum element that is greater than last of if such element doesn't exist you take the minimal element.
- every time a squad ends your write pointer hops to the next track.
- repeat until all elements are on single track(hopefully sorted in non-descending order)
Reel of 1/2" tape showing beginning-of-tape reflective marker. (Photo credit: Wikipedia) |
This is my final product(bear in mind there was an additional twist: code should also be capable of sorting in non-ascending order thus the up variable).
Some additional explanation: all tracks should be in separate files(no overwriting - for automatic checking). N is number of tracks, prefix is track name prefix, and i is current iteration.
Some additional explanation: all tracks should be in separate files(no overwriting - for automatic checking). N is number of tracks, prefix is track name prefix, and i is current iteration.
int i = 0; Iterable<Integer> source = new InTrack(inName); MultiSink sink; do{ sink = new MultiSink(prefix, i, N, up); for(int n : source) sink.write(n); source = new MultiSource(prefix, i, N, up); i++; } while (sink.isMoreThanOneUsed());I'm, probably not allowed to share my full solution because of university rules so I won't.
Now to comment on this. I have a source that's agnostic to the amount of open files. And I have a similar sink. End-point switching is implemented in MultiSink and element choosing is in MultiSource. Both InTrack and MultiSource implement Iterable(and Iterator) so I can use them in a for-each loop. And code is as pretty as I can get it(remaining in java). All in all ~300 lines(with InTrack & stuff). After removing uneeded utility methods and comments ~220 lines. Eww.. Thats way too much.
Scala to the rescue
Lets rewrite this in a functional matter using scala. And while I'm at it, no vars or m
utable collections.
Scala (programming language) (Photo credit: Wikipedia) |
Input can be a collection right? Just implement Traversable. Not really. The whole point of tracks is they only hold one element in memory(or a few for efficiency but that's currently not my concern). So a track can be implemented as a Stream(linked list with a lazy val for tail).
def Reader(filename: String) = { val sc = new Scanner(new File(filename)) def loop(): Stream[Int] = { if (sc.hasNextInt){ sc.nextInt() #:: loop } else Stream.empty[Int] } loop() }
This is the constructor function for the input stream. It just returns a recursive value that has a Scanner in its closure. As stream elements are immutable you get an iron clad guarantee that sc will stay in sync. And you get all collections stuff for free. Moving on, how to abstract over multiple streams? That should be a stream again right? I kinda feel my code is too complicated and that it could be done simpler but that's what I came up with
def MultiReader(prefix: String, phase: Int, N: Int, up: Boolean) = { def loop(last: Int, sources: Seq[Stream[Int]]): Stream[Int] = { val nonEmpty = sources.filterNot(_.isEmpty) if(nonEmpty.length==0) Stream.empty[Int] else { val (low,high) = nonEmpty. map(_.head).zipWithIndex. partition(t => up && t._1 < last || !up && t._1 > last) val (e,i) = (if(high.length>0) high else low).minBy(_._1) e #:: loop(e, nonEmpty.updated(i, nonEmpty(i).tail)) } } loop(0, (0 until N).map(n => Reader(prefix + "-" + phase + "-" + n))) }Let's walk through. Again the stream is recursive. It starts with a collection of Readers set to right files. Then in each step you filter out empty stream(tracks with no more elements) and partition them according to the last element(in the argument). If there are higher you take their minimum else you take the minimum of lower. And loop with passing on the read element and a new collection - non empty streams with the read one advanced by one element.
Writer was a bit trickier. It needs internal state, but I prohibited mutable state. Solution is to return a new Writer containing a new state every time you write. Then the user must just be careful not to use stale Writers - not that big a deal.
This is the Writer trait
trait Writer{ def write(num: Int): Writer def moreThanOneUsed: Boolean }Very simple interface. And here's the recursive constructor function
def Writer(prefix: String, phase: Int, N: Int, up: Boolean): Writer = { val tracks = (0 until N).map( n => new PrintWriter( new BufferedWriter( new FileWriter(prefix+"-"+phase+"-"+n)))) def mkWriter(i: Int, last: Int, used: Boolean): Writer = new Writer{ def write(num: Int) = { val (ni,nu) = if (up && num < last || !up && num > last) ((i + 1) % tracks.length, true) else (i, used) tracks(ni).print(num) tracks(ni).print(' ') tracks(ni).flush() mkWriter(ni, num, nu) } def moreThanOneUsed = used } mkWriter(0, if (up) Integer.MIN_VALUE else Integer.MAX_VALUE, used=false) }Creates all the tracks to be put in the closure. First writer has the proper start value then every next is constructed like this: figure out the new values for track number and 'used'(the long if) then actually write out and return a new writer encapsulating track number and 'used'. Since these writers are quite lightweight garbage collection pressure shouldn't be a problem. Especially since the whole process is bound by I/O. Anyway you could optimize by creating all possible states in advance and just passing a new reference each time.
Putting it all together.
def loop(i: Int, source: Stream[Int]){ val sink = source.foldLeft(Writer(prefix, i, N, up))(_ write _) if (sink.moreThanOneUsed) loop(i+1, MultiReader(prefix, i, N ,up)) } loop(0, Reader(inName))So you take an input stream, fold it over a writer writing in each step. And if you used more than one track you repeat with a new input stream.
I find this solution to be MUCH MORE elegant. Not to mention it's just 65 lines of scala. But it makes me really sad they don't even mention functional programming at algorithms course. I'm probably gonna pay the professor and TA's a visit in near future.
Related articles
Subscribe to:
Posts (Atom)