In this blog I will be demonstrating the use of Tail End Recursion in F#. In functional languages, looping constructs such as while and for loops are rarely seen in code. Instead code is written to act upon streams of data which can have either a finite or infinite number of elements. LINQ follows the same model where it “pretends” that database records and XML elements are a stream of objects.
Functional languages process streams by “pealing” elements out of the lists one at a time using the head::tail syntax. In this blog I will demonstrate how to write a simple program using both a recursive and a non-recursive approach.
Linear Regression Model
Assume we have collected some data then plotted it onto a graph. Within the scatter diagram it appears that the data follows a straight line. A straight line is defined by the following simple formula y = mx + c.
Assuming that the sample above can be modeled by a simple line. We can assume that each sample y(i) = mx(i) + c + noise(i). We want to estimate what the values of m and c are for the graph above. To do this we need to make some assumptions about the noise generator:
- Average noise value is zero
- Noise is random and uncorrelated (noise samples are unrelated and are independent)
If the assumptions are valid then we can use the Linear Regression Model which can be found here. N is used in the formulas to indicate the number of samples.
![clip_image002[8] clip_image002[8]](/image.axd?picture=WindowsLiveWriter/TailEndRecursionandF_97C2/clip_image002%5B8%5D_thumb.gif)
![clip_image002[10] clip_image002[10]](/image.axd?picture=WindowsLiveWriter/TailEndRecursionandF_97C2/clip_image002%5B10%5D_thumb.gif)
We can load the samples into F# using a list of tuples of (x,y).
let samples = [
(1.0,15.95944862);
(2.0,14.70482969);
(3.0,19.57532692);
(4.0,24.37143171); …]
Simple implementation
The simplest solution is the calculate the values by using built in F# functions to find the:
- Sum of x elements
- sum of y elements
- sum of x*x elements
- sum of x*y elements
And using the results to populate the m and c values
type linearRegressionCoeffs = {
m:float;
c:float;
}
1: let RegressionSimple l : linearRegressionCoeffs =
2: // Create list of x and y by themselves is we can easily calculate the sum of x & y elements
3:let (x, y) = List.unzip l
4:let sumX = List.sum x
5:let sumY = List.sum y
6:
7: // Calculate the products on the X*Y and X*X
8:let sumXY = List.sum_by(fun f ->
9: let (x, y) = f
10: x * y
11: ) l
12:let sumXX = List.sum_by(fun f ->
13: let (x, y) = f
14: x * x
15: ) l
16:
17: // Calculate the m and c coefficients
18:let n = (float)(List.length l)
19:let m = (n*sumXY - (sumX *sumY))/(n*sumXX - sumX*sumX)
20:let c = (sumY - m*sumX)/n
21: {m = m; c=c;}
Optimising using a while loop
Obviously iterating over the list five times is extremely inefficient (four for the calculates and a further time in the unzip method). In C# we would use a while or for loop to calculate the values and we can follow the same pattern in F#.
1: type regressionCoeffs = {
2: mutable sumXiXi:float;
3: mutable sumXiYi:float;
4: mutable sumXi:float;
5: mutable sumYi:float;
6: }
7:
8: let public regressionWithLoop l : linearRegressionCoeffs =
9:let accumilator:regressionCoeffs = {sumXiXi = 0.0; sumXiYi =0.0; sumXi = 0.0; sumYi = 0.0;}
10:
11:for sample in l do
12:let (x,y) = sample
13: accumilator.sumXiYi <- accumilator.sumXiYi + x*y;
14: accumilator.sumYi <- accumilator.sumYi + y;
15: accumilator.sumXi <- accumilator.sumXi + x;
16: accumilator.sumXiXi <- accumilator.sumXiXi + x*x;
17:
18:let length = (float) (List.length l)
19:let meanX = accumilator.sumXi/length
20:let meanY = accumilator.sumYi/length
21:let m = (accumilator.sumXiYi - length*meanX*meanY) / (accumilator.sumXiXi - length* meanX* meanX)
22:let c = meanY - m*meanX
23: {m = m; c=c;}
Unfortunately the solution above resorts to having to update a temporary variable using the mutable keyword which defeats one of the main benefits of using a safe language like F#.
Using Tail End Recursion
Tail end recursion is the process of recursively calling a method (identified by the rec keyword) that eats through the list and processing one element at a time.
1: let public regressionFast l : linearRegressionCoeffs =
2:let rec calcRegressionCoeffs l (coeffs:regressionCoeffs) =
3: match l with
4: |head::tail ->
5:let (x,y) = head
6:
7:let nextCoeffs = {
8: sumXiYi = coeffs.sumXiYi + x*y;
9: sumYi = coeffs.sumYi + y;
10: sumXi = coeffs.sumXi + x;
11: sumXiXi = coeffs.sumXiXi + x*x;}
12:
13: calcRegressionCoeffs tail nextCoeffs
14: |[]-> coeffs
15:
16:let t = calcRegressionCoeffs l {sumXiYi = 0.0; sumXi = 0.0; sumYi = 0.0; sumXiXi = 0.0;}
17:let length = (float) (List.length l)
18:let meanX = t.sumXi/length
19:let meanY = t.sumYi/length
20:let m = (t.sumXiYi - length*meanX*meanY) / (t.sumXiXi - length* meanX* meanX)
21:let c = meanY - m*meanX
22: {m = m; c=c;}
The head::tail syntax within the match statement is used to peal off the first element off the list and place the rest of the list in the tail. calcRegressionCoeffs is repeatedly executed until the code has eaten through all the elements until the list is empty [] at which point is is found by the second part of the match statement. In this version of the code the regressionCoeffs members can be immutable as a new variable (nextCoeffs) is created each time the code recursively eats through the list.
The code now appears very inefficient compared to the previous version that used a while loop. For instance it appears that the calcRegressionCoeffs method is executed for once for each element in the list and within each method call a temporary variable is created. However this is not the case as tail end recursion is a core idiom within functional languages and for this reason the compiler AUTOMATICALLY converts the code into a very fast while loop.
Using reflector and drilling into the compiled code we find the implementation of the regressionFast method
1: public static linearRegressionCoeffs regressionFast(List<Tuple<double, double>> l)
2: {
3: regressionCoeffs t = new regressionCoeffs(0.0, 0.0, 0.0, 0.0);
4: ListModule.iter<Tuple<double, double>>(new clo@31(t), l);
5:double length = ListModule.length<Tuple<double, double>>(l);
6:double meanX = t._sumXi / length;
7:double meanY = t._sumYi / length;
8:double b2 = (t._sumXiYi - ((length * meanX) * meanY)) / (t._sumXiXi - ((length * meanX) * meanX));
9:return new linearRegressionCoeffs(meanY - (b2 * meanX), b2);
10: }
11:
12:
The recursive code has been automatically converted into a call to the iter method which in turn calls the closure 15 class. The class itself is below:
1: [Serializable]
2: internal class clo@15 : OptimizedClosures.FastFunc2<List<Tuple<double, double>>, Regression.regressionCoeffs, Regression.regressionCoeffs>
3: {
4: // Methods
5:public clo@15();
6:public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs);
7: }
8:
9:
10: Expand Methods
11:
Notice that the compiler overrides the Invoke method.
1: public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs)
2: {
3:while (true)
4: {
5: List<Tuple<double, double>> that = l;
6:if (that is List<Tuple<double, double>>._Empty)
7: {
8: return coeffs;
9: }
10: List<Tuple<double, double>> list = ((List<Tuple<double, double>>._Cons) that).get___Tail();
11: Tuple<double, double> tuple2 = ((List<Tuple<double, double>>._Cons) that).get___Head();
12:double num = tuple2.Item2;
13:double num2 = tuple2.Item1;
14: Regression.regressionCoeffs coeffs = new Regression.regressionCoeffs(coeffs._sumXiXi + (num2 * num2), coeffs._sumXiYi + (num2 * num), coeffs._sumXi + num2, coeffs._sumYi + num);
15: coeffs = coeffs;
16: l = list;
17: }
18: }
19:
20:
Interestingly the Invoke method directly updates the local coeff instance. So within F# the regressionCoeffs type is immutable, however the compiler has created a mutable value that is updated. In summary the F# code that included recursion and temporary variables has been optimised away into a simple while loop and the temporary variable has replaced with a single instance.
Finally here is the test
1: [<Test>]
2: let verify_regressionfast_using_static_data =
3:let samples = [
4: (1.0,15.95944862);
5: (2.0,14.70482969);
6: (3.0,19.57532692);
7: (4.0,24.37143171);
8: (5.0,18.32077464);
9: (6.0,27.80173622);
10: (7.0,29.20721381);
11: (8.0,40.07369171);
12: (9.0,39.10475664);
13: (10.0,46.4686677);
14: (11.0,35.46231593);
15: (12.0,52.42103983);
16: (13.0,54.16002796);
17: (14.0,56.40865623);
18: (15.0,48.77890855);
19: (16.0,57.80104059);
20: (17.0,57.79730541);
21: (18.0,72.22923153);
22: (19.0,57.63875286);
23: (20.0,75.45859069);
24: (21.0,70.21595096)]
25:
26:let resultSimple = RegressionSimple samples
27: Assert.AreApproximatelyEqual (10.0, resultSimple.c, 3.0)
28: Assert.AreApproximatelyEqual (3.0, resultSimple.m, 0.2)
29:
30:let resultWithLoop = regressionWithLoop samples
31: Assert.AreApproximatelyEqual (10.0, resultWithLoop.c, 3.0)
32: Assert.AreApproximatelyEqual (3.0, resultWithLoop.m, 0.2)
33:
34:let resultFast = regressionFast samples
35: Assert.AreApproximatelyEqual (10.0, resultFast.c, 3.0)
36: Assert.AreApproximatelyEqual (3.0, resultFast.m, 0.2)
37:
Summary
F# allows the developer to write safe code using recursion and the compiler will optimise the code into a form similar to a hand crafted while loop.