David Ross's Blog Random thoughts of a coder

Tail End Recursion and F# – Part 2

9. June 2009 23:03 by David in

In first post on Tail End Recursion it was shown that recursive functions can be automatically replaced with a fast while loop by the compiler.  The post implies that the head::tail syntax should be used to write F# code instead of using a while loop.  In reality this is often unnecessary due to the pipeline operator |> .  The pipeline operator is used to channel the output of one function into the input parameters of another.  This is similar to a pipe operator in Unix.

ls | grep “my program” | wc –l

In F# the operator allows streams to be filtered and enriched.  I have converted the previous code to use the pipeline operator.  The code function coverts a (x,y) tuple into a (x, y, x*x, y*y, x*y) tuple this holds the temporary working data for the program.  The work data is then summed together and finally the linear regression is calculated.

   1: let public regressionUsingPipelineOperator l : linearRegressionCoeffs =
   2:let length = (float) (List.length l)
   3:  
   4:let working s = 
   5:let (x,y) = s
   6:     (x, y , x*x, y*y, x*y)
   7:  
   8:let calc =
   9:     l |> Seq.map working
  10:       |> Seq.fold (fun s acc ->
  11: let (a,b,c,d,e) = s
  12: let (a1,b1,c1,d1,e1) = acc
  13:                     (a + a1, b + b1, c + c1, d + d1, e + e1)
  14:                  ) (0.0, 0.0, 0.0, 0.0, 0.0) 
  15:       |> fun (sumX, sumY, sumXX, sumYY, sumXY)   ->
  16: let meanX = sumX/length
  17: let meanY = sumY/length
  18: let m = (sumXY - length*meanX*meanY) / (sumXX - length* meanX * meanX)
  19: let c = meanY - m*meanX
  20:            {m = m; c=c;}
  21:
  22:   calc

In the example above the pipeline method is slightly slower than the hand crafted code using the head::tail syntax (since in a single step the code does the equivalent of the map and fold operation).  However the programmer’s intent is far clearer and hence should be used as the default solution.

Update:

The code above can be simplified to by reversing the parameters to the Seq.Fold method as follows.

   1: let public regressionUsingPipelineOperator l : linearRegressionCoeffs =
   2:let length = (float) (List.length l)
   3:  
   4:let calc =
   5:     l|> Seq.fold (fun acc (x,y) ->
   6: let (a,b,c,d,e) = acc
   7:                     (a + x, b + y, c + x*x, d + y*y, e + x*y)
   8:                  ) (0.0, 0.0, 0.0, 0.0, 0.0) 
   9:       |> fun (sumX, sumY, sumXX, sumYY, sumXY)   ->
  10: let meanX = sumX/length
  11: let meanY = sumY/length
  12: let m = (sumXY - length*meanX*meanY) / (sumXX - length* meanX * meanX)
  13: let c = meanY - m*meanX
  14:            {m = m; c=c;}
  15:
  16:   calc

Whoops…  F# code seems to continually shrink.

Tail End Recursion and F#

9. June 2009 09:47 by David in

In this blog I will be demonstrating the use of Tail End Recursion in F#.  In functional languages, looping constructs such as while and for loops are rarely seen in code.  Instead code is written to act upon streams of data which can have either a finite or infinite number of elements.  LINQ follows the same model where it “pretends” that database records and XML elements are a stream of objects.

Functional languages process streams by “pealing” elements out of the lists one at a time using the head::tail syntax.  In this blog I will demonstrate how to write a simple program using both a recursive and a non-recursive approach.

Linear Regression Model

Assume we have collected some data then plotted it onto a graph.   Within the scatter diagram it appears that the data follows a straight line.  A straight line is defined by the following simple formula y = mx + c.

image

Assuming that the sample above can be modeled by a simple line.  We can assume that each sample y(i) = mx(i) + c + noise(i).  We want to estimate what the values of m and c are for the graph above.  To do this we need to make some assumptions about the noise generator:

  • Average noise value is zero
  • Noise is random and uncorrelated (noise samples are unrelated and are independent)

If the assumptions are valid then we can use the Linear Regression Model which can be found hereN is used in the formulas to indicate the number of samples.

clip_image002[8]

clip_image002[10]

 

We can load the samples into F# using a list of tuples of (x,y).

let samples = [
  (1.0,15.95944862);
  (2.0,14.70482969);
  (3.0,19.57532692);
  (4.0,24.37143171); …]

 

Simple implementation

The simplest solution is the calculate the values by using built in F# functions to find the:

  • Sum of x elements
  • sum of y elements
  • sum of x*x elements
  • sum of x*y elements

And using the results to populate the m and c values

type linearRegressionCoeffs = {
  m:float;
  c:float;
}

   1: let RegressionSimple l : linearRegressionCoeffs =
   2:   // Create list of x and y by themselves is we can easily calculate the sum of x & y elements
   3:let (x, y) = List.unzip l
   4:let sumX = List.sum x
   5:let sumY = List.sum y
   6:  
   7:   // Calculate the products on the X*Y and X*X
   8:let sumXY = List.sum_by(fun f ->
   9: let (x, y) = f
  10:                              x * y
  11:                           ) l
  12:let sumXX = List.sum_by(fun f ->
  13: let (x, y) = f
  14:                              x * x
  15:                           ) l
  16:  
  17:   // Calculate the m and c coefficients
  18:let n = (float)(List.length l)
  19:let m = (n*sumXY - (sumX *sumY))/(n*sumXX - sumX*sumX)
  20:let c = (sumY - m*sumX)/n
  21:   {m = m; c=c;}

 

Optimising using a while loop

Obviously iterating over the list five times is extremely inefficient (four for the calculates and a further time in the unzip method).  In C# we would use a while or for loop to calculate the values and we can follow the same pattern in F#.

   1: type regressionCoeffs = {
   2:   mutable sumXiXi:float;
   3:   mutable sumXiYi:float;
   4:   mutable sumXi:float;
   5:   mutable sumYi:float;
   6: }
   7:  
   8: let public regressionWithLoop l : linearRegressionCoeffs =
   9:let accumilator:regressionCoeffs = {sumXiXi = 0.0; sumXiYi =0.0; sumXi = 0.0; sumYi = 0.0;}
  10:
  11:for sample in l do
  12:let (x,y) = sample
  13:     accumilator.sumXiYi <- accumilator.sumXiYi + x*y; 
  14:     accumilator.sumYi <- accumilator.sumYi + y; 
  15:     accumilator.sumXi <- accumilator.sumXi + x;
  16:     accumilator.sumXiXi <- accumilator.sumXiXi + x*x;
  17: 
  18:let length = (float) (List.length l)
  19:let meanX = accumilator.sumXi/length
  20:let meanY = accumilator.sumYi/length
  21:let m = (accumilator.sumXiYi - length*meanX*meanY) / (accumilator.sumXiXi - length* meanX* meanX)
  22:let c = meanY - m*meanX
  23:   {m = m; c=c;}

Unfortunately the solution above resorts to having to update a temporary variable using the mutable keyword which defeats one of the main benefits of using a safe language like F#.

Using Tail End Recursion

Tail end recursion is the process of recursively calling a method (identified by the rec keyword) that eats through the list and processing one element at a time.

   1: let public regressionFast l : linearRegressionCoeffs =
   2:let rec calcRegressionCoeffs l (coeffs:regressionCoeffs) =
   3:     match l with
   4:       |head::tail -> 
   5:let (x,y) = head
   6:
   7:let nextCoeffs = {
   8:            sumXiYi = coeffs.sumXiYi + x*y; 
   9:            sumYi = coeffs.sumYi + y; 
  10:            sumXi = coeffs.sumXi + x;
  11:            sumXiXi = coeffs.sumXiXi + x*x;}
  12:
  13:          calcRegressionCoeffs tail nextCoeffs
  14:       |[]-> coeffs
  15:
  16:let t = calcRegressionCoeffs l {sumXiYi = 0.0; sumXi = 0.0; sumYi = 0.0; sumXiXi = 0.0;}
  17:let length = (float) (List.length l)
  18:let meanX = t.sumXi/length
  19:let meanY = t.sumYi/length
  20:let m = (t.sumXiYi - length*meanX*meanY) / (t.sumXiXi - length* meanX* meanX)
  21:let c = meanY - m*meanX
  22:   {m = m; c=c;}

The head::tail syntax within the match statement is used to peal off the first element off the list and place the rest of the list in the tailcalcRegressionCoeffs is repeatedly executed until the code has eaten through all the elements until the list is empty [] at which point is is found by the second part of the match statement.  In this version of the code the regressionCoeffs members can be immutable as a new variable (nextCoeffs) is created each time the code recursively eats through the list.

The code now appears very inefficient compared to the previous version that used a while loop.  For instance it appears that the calcRegressionCoeffs method is executed for once for each element in the list and within each method call a temporary variable is created.  However this is not the case as tail end recursion is a core idiom within functional languages and for this reason the compiler AUTOMATICALLY converts the code into a very fast while loop.

Using reflector and drilling into the compiled code we find the implementation of the regressionFast method

   1: public static linearRegressionCoeffs regressionFast(List<Tuple<double, double>> l)
   2: {
   3:     regressionCoeffs t = new regressionCoeffs(0.0, 0.0, 0.0, 0.0);
   4:     ListModule.iter<Tuple<double, double>>(new clo@31(t), l);
   5:double length = ListModule.length<Tuple<double, double>>(l);
   6:double meanX = t._sumXi / length;
   7:double meanY = t._sumYi / length;
   8:double b2 = (t._sumXiYi - ((length * meanX) * meanY)) / (t._sumXiXi - ((length * meanX) * meanX));
   9:return new linearRegressionCoeffs(meanY - (b2 * meanX), b2);
  10: }
  11:  
  12:

The recursive code has been automatically converted into a call to the iter method which in turn calls the closure 15 class.  The class itself is below:

   1: [Serializable]
   2: internal class clo@15 : OptimizedClosures.FastFunc2<List<Tuple<double, double>>, Regression.regressionCoeffs, Regression.regressionCoeffs>
   3: {
   4:     // Methods
   5:public clo@15();
   6:public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs);
   7: }
   8:  
   9:
  10: Expand Methods
  11:

Notice that the compiler overrides the Invoke method.

   1: public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs)
   2: {
   3:while (true)
   4:     {
   5:         List<Tuple<double, double>> that = l;
   6:if (that is List<Tuple<double, double>>._Empty)
   7:         {
   8: return coeffs;
   9:         }
  10:         List<Tuple<double, double>> list = ((List<Tuple<double, double>>._Cons) that).get___Tail();
  11:         Tuple<double, double> tuple2 = ((List<Tuple<double, double>>._Cons) that).get___Head();
  12:double num = tuple2.Item2;
  13:double num2 = tuple2.Item1;
  14:         Regression.regressionCoeffs coeffs = new Regression.regressionCoeffs(coeffs._sumXiXi + (num2 * num2), coeffs._sumXiYi + (num2 * num), coeffs._sumXi + num2, coeffs._sumYi + num);
  15:         coeffs = coeffs;
  16:         l = list;
  17:     }
  18: }
  19:  
  20:

Interestingly the Invoke method directly updates the local coeff instance.    So within F# the regressionCoeffs  type is immutable, however the compiler has created a mutable value that is updated.  In summary the F# code that included recursion and temporary variables has been optimised away into  a simple while loop and the temporary variable has replaced with a single instance.

Finally here is the test

   1: [<Test>]
   2: let verify_regressionfast_using_static_data = 
   3:let samples = [
   4:     (1.0,15.95944862);
   5:     (2.0,14.70482969);
   6:     (3.0,19.57532692);
   7:     (4.0,24.37143171);
   8:     (5.0,18.32077464);
   9:     (6.0,27.80173622);
  10:     (7.0,29.20721381);
  11:     (8.0,40.07369171);
  12:     (9.0,39.10475664);
  13:     (10.0,46.4686677);
  14:     (11.0,35.46231593);
  15:     (12.0,52.42103983);
  16:     (13.0,54.16002796);
  17:     (14.0,56.40865623);
  18:     (15.0,48.77890855);
  19:     (16.0,57.80104059);
  20:     (17.0,57.79730541);
  21:     (18.0,72.22923153);
  22:     (19.0,57.63875286);
  23:     (20.0,75.45859069);
  24:     (21.0,70.21595096)]
  25:  
  26:let resultSimple = RegressionSimple samples
  27:   Assert.AreApproximatelyEqual (10.0, resultSimple.c, 3.0)
  28:   Assert.AreApproximatelyEqual (3.0, resultSimple.m, 0.2)
  29:  
  30:let resultWithLoop = regressionWithLoop samples
  31:   Assert.AreApproximatelyEqual (10.0, resultWithLoop.c, 3.0)
  32:   Assert.AreApproximatelyEqual (3.0, resultWithLoop.m, 0.2)
  33:  
  34:let resultFast = regressionFast samples
  35:   Assert.AreApproximatelyEqual (10.0, resultFast.c, 3.0)
  36:   Assert.AreApproximatelyEqual (3.0, resultFast.m, 0.2)
  37:

Summary

F# allows the developer to write safe code using recursion and the compiler will optimise the code into a form similar to a hand crafted while loop.