Presentation Future-proofing collections: from mutable to persistent to parallel

Multicore processors are on every desk now. How are we going to make use of the extra power they provide? Some think that actors or transactional memory will save the day by making concurrent programming easier and safer. Even though these are welcome, I am skeptical about their ultimate success. Concurrency is fundamentally hard and no dressing up will be able to hide that fact completely. A safer and for the programmer much simpler alternative is to treat parallel execution as essentially an optimization. A promising application area are collections. Programing by transforming and aggregating collections is simple, powerful, and can optimized by executing bulk operations in parallel. To be able to do this in practice, any side effects of parallel operations need to be carefully controlled. This means that immutable, persistent collections are more suitable than mutable ones. In this talk I will describe the new Scala collections framework, and show how it allows a seamles...

Speakers


PDF: slides.pdf

Slides

Welcome to Devoxx 2010!

Future-Proofing Collections

Future-Proofing Collections from mutable to persistent to parallel Martin Odersky Scala Solutions and EPFL

The challenge we are facing:

The challenge we are facing: How to pay for lunch? 2

The challenge we are facing:

The challenge we are facing: How to make use of rapidly increasing hardware parallelism? Example: Nvidia Fermi: – One processor needs 24’000 running threads to keep it happy. Currently: – Server: Distribute connections over cores – Client: Only parallelize special tasks such as graphics. This will not last! Need to deal with: – Big data – Massive simulations – High-res models PPP grand challenge 3

Concurrent Programming to the Rescue?

Concurrent Programming to the Rescue? • Today, an often heard answer is concurrent programming, using – – – – Threads and locks, or Actors with messages, or Agents, or STM //  asynchronous  message  send actor  !  message //  message  receive receive  {    case  msgpat1  =>  action1    …            case  msgpatn  =>  actionn } • • All these concepts are useful. But they will not solve the PPP Challenge. – Fundamental problem: non-determinism – Leads to Heisenbugs, different behavior on different hardware, etc 4

The Root of The Problem

The Root of The Problem • Non-determinism caused by concurrent threads accessing shared mutable state. We can encapsulate state in actors or transactions, but the fundamental problem is the same. So, • • non-determinism = var  x  =  0 async  {  x  =  x  +  1  } async  {  x  =  x  *  2  } //  can  give  0,  1,  2 • • parallel processing + mutable state To get deterministic processing, avoid the mutable state! Avoiding mutable state means programming functionally. 5

Space vs Time

Space vs Time Space (functional/parallel) Time (imperative/concurrent) 6

Scala

Scala 7

Scala is a Unifier

Scala is a Unifier Agile, with lightweight syntax Object-Oriented Scala Functional Safe and performant, with strong static tpying 8

Let’s see an example:

Let’s see an example: 9

A class ...

A class ... public  class  Person  {    public  final  String  name;    public  final  int  age;    Person(String  name,  int  age)  {            this.name  =  name;            this.age  =  age;    } } class  Person(val  name:  String,                            val  age:  Int)  {}

... and its usage

... and its usage import  java.util.ArrayList; ... Person[]  people; Person[]  minors; Person[]  adults; {    ArrayList  minorsList  =  new  ArrayList();      ArrayList  adultsList  =  new  ArrayList();      for  (int  i  =  0;  i  <  people.length;  i++)              (people[i].age  <  18  ?  minorsList  :  adultsList)          .add(people[i]);      minors  =  minorsList.toArray(people); A function value      adults  =  adultsList.toArray(people); An infix method call A simple pattern match val  people:  Array[Person] val  (minors,  adults)  =  people  partition  (_.age  <  18)

Scala 2.8 Collections

Scala 2.8 Collections Building interesting things in space 12

... and its usage

... and its usage import  java.util.ArrayList; ... Person[]  people; Person[]  minors; Person[]  adults; {    ArrayList  minorsList  =  new  ArrayList();      ArrayList  adultsList  =  new  ArrayList();      for  (int  i  =  0;  i  <  people.length;  i++)              (people[i].age  <  18  ?  minorsList  :  adultsList)          .add(people[i]);      minors  =  minorsList.toArray(people); A function value      adults  =  adultsList.toArray(people); An infix method call A simple pattern match val  people:  Array[Person] val  (minors,  adults)  =  people  partition  (_.age  <  18)

Scala 2.8 Collections

Scala 2.8 Collections Building interesting things in space 12

The Scala Way of Collections

The Scala Way of Collections • • De-emphasize destructive updates Focus on transformers that map collections to collections Have a complete range of persistent collections scala> val ys = List(1, 2, 3) ys: List[Int] = List(1, 2, 3) scala> val xs: Seq[Int] = ys xs: Seq[Int] = List(1, 2, 3) scala> xs map (_ + 1) res0: Seq[Int] = List(2, 3, 4) scala> ys map (_ + 1) res1: List[Int] = List(2, 3, 4) • 13

Collection Properties

Collection Properties • • • • object-oriented generic: List[T],  Map[K,  V] optionally persistent, e.g. collection.immutable.Seq higher-order, with methods such as foreach,  map,   filter. Uniform return type principle: Operations return collections of the same type (constructor) as their left operand, as long as this makes sense. scala> val ys = List(1, 2, 3) ys: List[Int] = List(1, 2, 3) scala> val xs: Seq[Int] = ys xs: Seq[Int] = List(1, 2, 3) scala> xs map (_ + 1) res0: Seq[Int] = List(2, 3, 4) scala> ys map (_ + 1) res1: List[Int] = List(2, 3, 4) • This makes a very elegant and powerful combination. 14

... and its usage

... and its usage import  java.util.ArrayList; ... Person[]  people; Person[]  minors; Person[]  adults; {    ArrayList  minorsList  =  new  ArrayList();      ArrayList  adultsList  =  new  ArrayList();      for  (int  i  =  0;  i  <  people.length;  i++)              (people[i].age  <  18  ?  minorsList  :  adultsList)          .add(people[i]);      minors  =  minorsList.toArray(people); A function value      adults  =  adultsList.toArray(people); An infix method call A simple pattern match val  people:  Array[Person] val  (minors,  adults)  =  people  partition  (_.age  <  18)

Scala 2.8 Collections

Scala 2.8 Collections Building interesting things in space 12

Collection Properties

Collection Properties • • • • object-oriented generic: List[T],  Map[K,  V] optionally persistent, e.g. collection.immutable.Seq higher-order, with methods such as foreach,  map,   filter. Uniform return type principle: Operations return collections of the same type (constructor) as their left operand, as long as this makes sense. scala> val ys = List(1, 2, 3) ys: List[Int] = List(1, 2, 3) scala> val xs: Seq[Int] = ys xs: Seq[Int] = List(1, 2, 3) scala> xs map (_ + 1) res0: Seq[Int] = List(2, 3, 4) scala> ys map (_ + 1) res1: List[Int] = List(2, 3, 4) • This makes a very elegant and powerful combination. 14

The Scala Way of Collections

The Scala Way of Collections • • De-emphasize destructive updates Focus on transformers that map collections to collections Have a complete range of persistent collections scala> val ys = List(1, 2, 3) ys: List[Int] = List(1, 2, 3) scala> val xs: Seq[Int] = ys xs: Seq[Int] = List(1, 2, 3) scala> xs map (_ + 1) res0: Seq[Int] = List(2, 3, 4) scala> ys map (_ + 1) res1: List[Int] = List(2, 3, 4) • 13

... and its usage

... and its usage import  java.util.ArrayList; ... Person[]  people; Person[]  minors; Person[]  adults; {    ArrayList  minorsList  =  new  ArrayList();      ArrayList  adultsList  =  new  ArrayList();      for  (int  i  =  0;  i  <  people.length;  i++)              (people[i].age  <  18  ?  minorsList  :  adultsList)          .add(people[i]);      minors  =  minorsList.toArray(people); A function value      adults  =  adultsList.toArray(people); An infix method call A simple pattern match val  people:  Array[Person] val  (minors,  adults)  =  people  partition  (_.age  <  18)

Using Collections: Map and filter

Using Collections: Map and filter scala> val xs = List(1, 2, 3) xs: List[Int] = List(1, 2, 3) scala> val ys = xs map (x => x + 1) ys: List[Int] = List(2, 3, 4) scala> val ys = xs map (_ + 1) ys: List[Int] = List(2, 3, 4) scala> val zs = ys filter (_ % 2 == 0) zs: List[Int] = List(2, 4) scala> val as = ys map (0 to _) as: List(Range(0, 1, 2), Range(0, 1, 2, 3), Range(0, 1, 2, 3, 4)) 15

Using Collections: flatMap and groupBy

Using Collections: flatMap and groupBy scala> val bs = as.flatten bs: List[Int] = List(0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4) scala> val bs = ys flatMap (0 to _) bs: List[Int] = List(0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4) scala> val fruit = Vector("apples", "oranges", "ananas") fruit: Vector[String] = Vector(apples, oranges, ananas) scala> fruit groupBy (_.head) res11: scala.collection.immutable.Map[Char,Vector[String]] = Map(a -> Vector(apples, ananas), o -> Vector(oranges)) 16

Using Collections: For Notation

Using Collections: For Notation scala> for (x <- xs) yield x + 1 res14: List[Int] = List(2, 3, 4) scala> for (x <- res14 if x % 2 == 0) yield x res15: List[Int] = List(2, 4) scala> for (x <- xs; y <- 0 to x) yield y // same as map // ~ filter // same as flatMap res17: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3) 17

Using Maps

Using Maps scala> val m = Map(1 -> "ABC", 2 -> "DEF", 3 -> "GHI") m: Map[Int, String] = Map(1 -> ABC, 2 -> DEF, 3-> GHI) scala> m(2) res0: String = DEF scala> m + (4 -> "JKL") res1: Map[Int, String] = Map(1 -> ABC, 2 -> DEF, 3 -> GHI, 4 -> JKL) scala> m map { case (k, v) => (v, k) } res8: Map[String,Int] = Map(ABC -> 1, DEF -> 2, GHI -> 3) 18

An Example

An Example • Task: Phone keys have mnemonics assigned to them. val  mnemonics  =  Map(            '2'  -­‐>  "ABC",  '3'  -­‐>  "DEF",  '4'  -­‐>  "GHI",  '5'  -­‐>  "JKL",              '6'  -­‐>  "MNO",  '7'  -­‐>  "PQRS",  '8'  -­‐>  "TUV",  '9'  -­‐>  "WXYZ") • Assume you are given a dictionary dict as a list of words. Design a class Coder with a method translate such that new  Coder(dict).translate(phoneNumber)   produces all phrases of words in dict that can serve as mnemonics for the phone number. Example: The phone number “7225276257” should have the mnemonic Scala  rocks • as one element of the list of solution phrases. 2-19

Program Example: Phone Mnemonics

Program Example: Phone Mnemonics • This example was taken from: Lutz Prechelt: An Empirical Comparison of Seven Programming Languages. IEEE Computer 33(10): 23-29 (2000) • • Tested with Tcl, Python, Perl, Rexx, Java, C++, C Code size medians: – 100 loc for scripting languages – 200-300 loc for the others 2-20

Outline of Class Coder

Outline of Class Coder import  collection.mutable.HashMap   class  Coder(words:  List[String])  {        private  val  mnemonics  =  Map(            '2'  -­‐>  "ABC",  '3'  -­‐>  "DEF",  '4'  -­‐>  "GHI",  '5'  -­‐>  "JKL",              '6'  -­‐>  "MNO",  '7'  -­‐>  "PQRS",  '8'  -­‐>  "TUV",  '9'  -­‐>  "WXYZ")                /**  Invert  the  mnemonics  map  to  give  a  map  from  chars  'A'  ...  'Z'  to  '2'  ...  '9'  */    private  val  upperCode:  Map[Char,  Char]  =  ??                        /**  Maps  a  word  to  the  digit  string  it  can  represent,  e.g.  “Java”  -­‐>  “5282”  */    private  def  wordCode(word:  String):  String  =  ??        /**  A  map  from  digit  strings  to  the  words  that  represent  them,          *  e,g.  “5282”  -­‐>  Set(“Java”,  “Kata”,  “Lava”,  ...)  */    private  val  wordsForNum:  Map[String,  Set[String]]  =  ??    /**  Return  all  ways  to  encode  a  number  as  a  list  of  words  */    def  encode(number:  String):  Set[List[String]]  =  ??             2-21

Class Coder  (1)

Class Coder  (1) import  collection.mutable.HashMap   class  Coder(words:  List[String])  {        private  val  mnemonics  =  Map(            '2'  -­‐>  "ABC",  '3'  -­‐>  "DEF",  '4'  -­‐>  "GHI",  '5'  -­‐>  "JKL",              '6'  -­‐>  "MNO",  '7'  -­‐>  "PQRS",  '8'  -­‐>  "TUV",  '9'  -­‐>  "WXYZ")                /**  Invert  the  mnemonics  map  to  give  a  map  from  chars  'A'  ...  'Z'  to  '2'  ...  '9'  */    private  val  upperCode:  Map[Char,  Char]  =                                      /**  Maps  a  word  to  the  digit  string  it  can  represent  */    private  def  wordCode(word:  String):  String  =  ??        /**  A  map  from  digit  strings  to  the  words  that  represent  them  */    private  val  wordsForNum:  Map[String,  List[String]]  =  ??    /**  Return  all  ways  to  encode  a  number  as  a  list  of  words  */    def  encode(number:  String):  Set[List[String]]  =  ??                /**  Maps  a  number  to  a  list  of  all  word  phrases  that  can  represent  it  */ 2-22

Class Coder  (1)

Class Coder  (1) 2-23

Class Coder  (2)

Class Coder  (2) 2-24

Class Coder  (2)

Class Coder  (2) 2-25

Class Coder  (3)

Class Coder  (3) 2-26

Class Coder  (3)

Class Coder  (3) 2-27

Class Coder  (4)

Class Coder  (4) class  Coder(words:  List[String])  {  ...    /**  Return  all  ways  to  encode  a  number  as  a  list  of  words  */    def  encode(number:  String):  Set[List[String]]  =        /**  Maps  a  number  to  a  list  of  all  word  phrases  that  can  represent  it  */ 2-28

Class Coder  (4)

Class Coder  (4) 2-29

Class Coder  (4)

Class Coder  (4) 2-30

Class Coder  (4)

Class Coder  (4) 2-31

Class Coder  (4)

Class Coder  (4) 2-32

Going Parallel

Going Parallel 2-33

In Summary

In Summary Problem solved with under 20 LOC Essential: Collection abstractions. Easily upgradable to parallel. Think collection transformers, not CRUD! 2-34

The Future

The Future Scala’s persistent collections are • easy to use: few steps to do the job • concise: one word replaces a whole loop • safe: type checker is really good at catching errors • fast: collection ops are tuned, can be parallelized • universal: one vocabulary to work on all kinds of collections 2-35

That was The Bright Side

That was The Bright Side 36

Now to the Dark Side

Now to the Dark Side 37

How is all this implemented?

How is all this implemented? 38

Everything is a Library

Everything is a Library • Collections feel like they are an organic part of Scala • But in fact the language does not contain any collectionrelated constructs – no collection types – no collection literals – no collection operators • Everything is done in a library • Everything is extensible – You can write your own collections which look and feel like the standard ones 2-39

Some General Scala Collections

Some General Scala Collections Constructed automatically using decodify. 4-40

Mutable or Immutable?

Mutable or Immutable? • All general collections come in three forms, and are stored in different packages: scala.collection scala.collection.mutable scala.collection.immutable • • • Immutable is the default, i.e. predefined imports go to scala.collection.immutable General collections in scala.collection  can be mutable or immutable. There are aliases for the most commonly used collections. scala.collection.immutable.List     where it is defined scala.List               List                         the alias in the scala package because scala._ is automatically imported 4-41

Immutable Scala Collections

Immutable Scala Collections 4-42

Mutable Scala Collections

Mutable Scala Collections 4-43

New Implementations: Vectors and Hash Tries

New Implementations: Vectors and Hash Tries • • • • • Trees with branch factor of 32. Persistent data structures with very efficient sequential and random access. Invented by Phil Bagwell, then adopted in Clojure. New: Persistent prepend/append/update in constant amortized time. Next: Fast splits and joins for parallel transformations.

The Uniform Return Type Principle

The Uniform Return Type Principle Bulk operations return collections of the same type (constructor) as their left operand. (DWIM) scala> val ys = List(1, 2, 3) ys: List[Int] = List(1, 2, 3) scala> val xs: Seq[Int] = ys xs: Seq[Int] = List(1, 2, 3) scala> xs map (_ + 1) res0: Seq[Int] = List(2, 3, 4) scala> ys map (_ + 1) res1: List[Int] = List(2, 3, 4) This is tricky to implement without code duplication!

Pre 2.8 Collection Structure

Pre 2.8 Collection Structure trait  Iterable[A]  {      def  filter(p:  A  =>  Boolean):  Iterable[A]  =  ...    def  partition(p:  A  =>  Boolean)  =          (filter(p(_)),  filter(!p(_)))    def  map[B](f:  A  =>  B):  Iterable[B]  =  ... } trait  Seq[A]  extends  Iterable[A]  {    def  filter(p:  A  =>  Boolean):  Seq[A]  =  ...    override  def  partition(p:  A  =>  Boolean)  =          (filter(p(_)),  filter(!p(_)))    def  map[B](f:  A  =>  B):  Seq[B]  =  ... }         4-46

Types force duplication

Types force duplication filter needs to be re-defined on each level partition also needs to be re-implemented on each level, even though its definition is everywhere the same. The same pattern repeats for many other operations and types. ideal conditions for Bit Rot 4-47

Signs of Bit Rot

Signs of Bit Rot Lots of duplications of methods. – Methods returning collections have to be repeated for every collection type. Inconsistencies. – Sometimes methods such as filter, map were not specialized in subclasses – More often, they only existed in subclasses, even though they could be generalized “Broken window” effect. – Classes that already had some ad-hoc methods became dumping grounds for lots more. – Classes that didn’t stayed clean. 4-48

Excerpts from List.scala

Excerpts from List.scala 4-49

How to do better?

How to do better? Can we abstract out the return type? Look at map: Need to abstract out the type constructor, not just the type. trait  Iterable[A]   def  map[B](f:  A  =>  B):  Iterable[B] trait  Seq[A]   def  map[B](f:  A  =>  B):  Seq[B]   But we can do that using Scala’s higher-kinded types!

HK Types Collection Structure

HK Types Collection Structure trait  TraversableLike[A,  CC[X]]  {      def  filter(p:  A  =>  Boolean):  CC[A]    def  map[B](f:  A  =>  B):  CC[B] } trait  Traversable[A]  extends  TraversableLike[A,  Traversable] trait  Iterable[A]        extends  TraversableLike[A,  Iterable] trait  Seq[A]                  extends  TraversableLike[A,  Seq] Here, CC is a parameter representing a type constructor.

Implementation with Builders

Implementation with Builders All ops in Traversable are implemented in terms of foreach and newBuilder. trait  Builder[A,  Coll]  {    def  +=  (elem:  A)      //  add  elems    def  result:  Coll      //  return  result } trait  TraversableLike[A,  CC[X]]  {    def  foreach(f:  A  =>  Unit)    def  newBuilder[B]:  Builder[B,  CC[B]]    def  map[B](f:  A  =>  B):  CC[B]  =  {        val  b  =  newBuilder[B]        foreach  (x  =>  b  +=  f(x))        b.result    } }

Unfortunately ...

Unfortunately ... ... things are not as parametric as it seems at first. Take: class  BitSet  extends  Set[Int] scala>  val  bs  =  BitSet(1,  2,  3) bs:  scala.collection.immutable.BitSet  =  BitSet(1,  2,  3) scala>  bs  map  (_  +  1) res0:  scala.collection.immutable.BitSet  =  BitSet(2,  3,  4) scala>  bs  map  (_.toString  +  "!") res1:  scala.collection.immutable.Set[java.lang.String]  =  Set(1!,  2!,  3!) Note that the result type is the “best possible” type that fits the element type of the new collection. Other examples: SortedSet,  String, Array

How to advance?

How to advance? We need more flexibility. Can we define our own type system for collections? Question: Given old collection type From, new element type Elem, and new collection type To: Can an operation on From build a collection of type To with Elem elements? Captured in: CanBuildFrom[From,  Elem,  To]

Facts about CanBuildFrom

Facts about CanBuildFrom Can be stated as axioms and inference rules: CanBuildFrom[Traversable[A],  B,  Traversable[B]]     CanBuildFrom[Set[A],  B,  Set[B]] CanBuildFrom[BitSet,  B,  Set[B]] CanBuildFrom[BitSet,  Int,  BitSet] CanBuildFrom[String,  Char,  String] CanBuildFrom[String,  B,  Seq[B]] CanBuildFrom[SortedSet[A],  B,  SortedSet[B]]    :-­‐    Ordering[B] where A and B are arbitrary types.

Objections

Objections 4-58

4-59

4-59

Use Cases

Use Cases • How to explain def  map[B,  To](f:  A  =>  B)                                            (implicit  cbf:  CanBuildFrom[Coll,  B,  To]):  To • • to a beginner? Key observation: We can approximate the type of map. For everyone but the most expert user   def  map[B](f:  A  =>  B):  Traversable[B]    //  in    class  Traversable def  map[B](f:  A  =>  B):  Seq[B]                    //  in    class  Seq,  etc • is detailed enough. These types are correct, they are just not as general as the type that’s actually implemented. 4-60

Part of the Solution: Flexible Doc Comments

Part of the Solution: Flexible Doc Comments 4-61

Going even further

Going even further But how do we keep a bunch of Fermi’s happy? – How to find and deal with 10000+ threads in an application? – Parallel collections are necessary but not sufficient for this. Our bet for the far future: parallel embedded DSLs. – Find parallelism in domains: physics simulation, machine learning, statistics, ... Joint work with Kunle Olukuton, Pat Hanrahan @ Stanford. 63

Producer or Consumer?

Producer or Consumer? Scala feels radically different for producers and consumers of advanced libraries. For the consumer: – Really easy – Things work intuitively – Can concentrate on domain, not implementation For the producer: – Sophisticated tool set – Can push the boundaries of what’s possible – Requires expertise and taste 64

How to find out more

How to find out more Series: “What’s new Scala 2.8?” on scala-lang.org and artima.com Programming in Scala 2nd ed 65

Thank You

Thank You 66