David Ross's Blog Random thoughts of a coder

Tail End Recursion and F#

9. June 2009 09:47 by David in

In this blog I will be demonstrating the use of Tail End Recursion in F#.  In functional languages, looping constructs such as while and for loops are rarely seen in code.  Instead code is written to act upon streams of data which can have either a finite or infinite number of elements.  LINQ follows the same model where it “pretends” that database records and XML elements are a stream of objects.

Functional languages process streams by “pealing” elements out of the lists one at a time using the head::tail syntax.  In this blog I will demonstrate how to write a simple program using both a recursive and a non-recursive approach.

Linear Regression Model

Assume we have collected some data then plotted it onto a graph.   Within the scatter diagram it appears that the data follows a straight line.  A straight line is defined by the following simple formula y = mx + c.

image

Assuming that the sample above can be modeled by a simple line.  We can assume that each sample y(i) = mx(i) + c + noise(i).  We want to estimate what the values of m and c are for the graph above.  To do this we need to make some assumptions about the noise generator:

  • Average noise value is zero
  • Noise is random and uncorrelated (noise samples are unrelated and are independent)

If the assumptions are valid then we can use the Linear Regression Model which can be found hereN is used in the formulas to indicate the number of samples.

clip_image002[8]

clip_image002[10]

 

We can load the samples into F# using a list of tuples of (x,y).

let samples = [
  (1.0,15.95944862);
  (2.0,14.70482969);
  (3.0,19.57532692);
  (4.0,24.37143171); …]

 

Simple implementation

The simplest solution is the calculate the values by using built in F# functions to find the:

  • Sum of x elements
  • sum of y elements
  • sum of x*x elements
  • sum of x*y elements

And using the results to populate the m and c values

type linearRegressionCoeffs = {
  m:float;
  c:float;
}

   1: let RegressionSimple l : linearRegressionCoeffs =
   2:   // Create list of x and y by themselves is we can easily calculate the sum of x & y elements
   3:let (x, y) = List.unzip l
   4:let sumX = List.sum x
   5:let sumY = List.sum y
   6:  
   7:   // Calculate the products on the X*Y and X*X
   8:let sumXY = List.sum_by(fun f ->
   9: let (x, y) = f
  10:                              x * y
  11:                           ) l
  12:let sumXX = List.sum_by(fun f ->
  13: let (x, y) = f
  14:                              x * x
  15:                           ) l
  16:  
  17:   // Calculate the m and c coefficients
  18:let n = (float)(List.length l)
  19:let m = (n*sumXY - (sumX *sumY))/(n*sumXX - sumX*sumX)
  20:let c = (sumY - m*sumX)/n
  21:   {m = m; c=c;}

 

Optimising using a while loop

Obviously iterating over the list five times is extremely inefficient (four for the calculates and a further time in the unzip method).  In C# we would use a while or for loop to calculate the values and we can follow the same pattern in F#.

   1: type regressionCoeffs = {
   2:   mutable sumXiXi:float;
   3:   mutable sumXiYi:float;
   4:   mutable sumXi:float;
   5:   mutable sumYi:float;
   6: }
   7:  
   8: let public regressionWithLoop l : linearRegressionCoeffs =
   9:let accumilator:regressionCoeffs = {sumXiXi = 0.0; sumXiYi =0.0; sumXi = 0.0; sumYi = 0.0;}
  10:
  11:for sample in l do
  12:let (x,y) = sample
  13:     accumilator.sumXiYi <- accumilator.sumXiYi + x*y; 
  14:     accumilator.sumYi <- accumilator.sumYi + y; 
  15:     accumilator.sumXi <- accumilator.sumXi + x;
  16:     accumilator.sumXiXi <- accumilator.sumXiXi + x*x;
  17: 
  18:let length = (float) (List.length l)
  19:let meanX = accumilator.sumXi/length
  20:let meanY = accumilator.sumYi/length
  21:let m = (accumilator.sumXiYi - length*meanX*meanY) / (accumilator.sumXiXi - length* meanX* meanX)
  22:let c = meanY - m*meanX
  23:   {m = m; c=c;}

Unfortunately the solution above resorts to having to update a temporary variable using the mutable keyword which defeats one of the main benefits of using a safe language like F#.

Using Tail End Recursion

Tail end recursion is the process of recursively calling a method (identified by the rec keyword) that eats through the list and processing one element at a time.

   1: let public regressionFast l : linearRegressionCoeffs =
   2:let rec calcRegressionCoeffs l (coeffs:regressionCoeffs) =
   3:     match l with
   4:       |head::tail -> 
   5:let (x,y) = head
   6:
   7:let nextCoeffs = {
   8:            sumXiYi = coeffs.sumXiYi + x*y; 
   9:            sumYi = coeffs.sumYi + y; 
  10:            sumXi = coeffs.sumXi + x;
  11:            sumXiXi = coeffs.sumXiXi + x*x;}
  12:
  13:          calcRegressionCoeffs tail nextCoeffs
  14:       |[]-> coeffs
  15:
  16:let t = calcRegressionCoeffs l {sumXiYi = 0.0; sumXi = 0.0; sumYi = 0.0; sumXiXi = 0.0;}
  17:let length = (float) (List.length l)
  18:let meanX = t.sumXi/length
  19:let meanY = t.sumYi/length
  20:let m = (t.sumXiYi - length*meanX*meanY) / (t.sumXiXi - length* meanX* meanX)
  21:let c = meanY - m*meanX
  22:   {m = m; c=c;}

The head::tail syntax within the match statement is used to peal off the first element off the list and place the rest of the list in the tailcalcRegressionCoeffs is repeatedly executed until the code has eaten through all the elements until the list is empty [] at which point is is found by the second part of the match statement.  In this version of the code the regressionCoeffs members can be immutable as a new variable (nextCoeffs) is created each time the code recursively eats through the list.

The code now appears very inefficient compared to the previous version that used a while loop.  For instance it appears that the calcRegressionCoeffs method is executed for once for each element in the list and within each method call a temporary variable is created.  However this is not the case as tail end recursion is a core idiom within functional languages and for this reason the compiler AUTOMATICALLY converts the code into a very fast while loop.

Using reflector and drilling into the compiled code we find the implementation of the regressionFast method

   1: public static linearRegressionCoeffs regressionFast(List<Tuple<double, double>> l)
   2: {
   3:     regressionCoeffs t = new regressionCoeffs(0.0, 0.0, 0.0, 0.0);
   4:     ListModule.iter<Tuple<double, double>>(new clo@31(t), l);
   5:double length = ListModule.length<Tuple<double, double>>(l);
   6:double meanX = t._sumXi / length;
   7:double meanY = t._sumYi / length;
   8:double b2 = (t._sumXiYi - ((length * meanX) * meanY)) / (t._sumXiXi - ((length * meanX) * meanX));
   9:return new linearRegressionCoeffs(meanY - (b2 * meanX), b2);
  10: }
  11:  
  12:

The recursive code has been automatically converted into a call to the iter method which in turn calls the closure 15 class.  The class itself is below:

   1: [Serializable]
   2: internal class clo@15 : OptimizedClosures.FastFunc2<List<Tuple<double, double>>, Regression.regressionCoeffs, Regression.regressionCoeffs>
   3: {
   4:     // Methods
   5:public clo@15();
   6:public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs);
   7: }
   8:  
   9:
  10: Expand Methods
  11:

Notice that the compiler overrides the Invoke method.

   1: public override Regression.regressionCoeffs Invoke(List<Tuple<double, double>> l, Regression.regressionCoeffs coeffs)
   2: {
   3:while (true)
   4:     {
   5:         List<Tuple<double, double>> that = l;
   6:if (that is List<Tuple<double, double>>._Empty)
   7:         {
   8: return coeffs;
   9:         }
  10:         List<Tuple<double, double>> list = ((List<Tuple<double, double>>._Cons) that).get___Tail();
  11:         Tuple<double, double> tuple2 = ((List<Tuple<double, double>>._Cons) that).get___Head();
  12:double num = tuple2.Item2;
  13:double num2 = tuple2.Item1;
  14:         Regression.regressionCoeffs coeffs = new Regression.regressionCoeffs(coeffs._sumXiXi + (num2 * num2), coeffs._sumXiYi + (num2 * num), coeffs._sumXi + num2, coeffs._sumYi + num);
  15:         coeffs = coeffs;
  16:         l = list;
  17:     }
  18: }
  19:  
  20:

Interestingly the Invoke method directly updates the local coeff instance.    So within F# the regressionCoeffs  type is immutable, however the compiler has created a mutable value that is updated.  In summary the F# code that included recursion and temporary variables has been optimised away into  a simple while loop and the temporary variable has replaced with a single instance.

Finally here is the test

   1: [<Test>]
   2: let verify_regressionfast_using_static_data = 
   3:let samples = [
   4:     (1.0,15.95944862);
   5:     (2.0,14.70482969);
   6:     (3.0,19.57532692);
   7:     (4.0,24.37143171);
   8:     (5.0,18.32077464);
   9:     (6.0,27.80173622);
  10:     (7.0,29.20721381);
  11:     (8.0,40.07369171);
  12:     (9.0,39.10475664);
  13:     (10.0,46.4686677);
  14:     (11.0,35.46231593);
  15:     (12.0,52.42103983);
  16:     (13.0,54.16002796);
  17:     (14.0,56.40865623);
  18:     (15.0,48.77890855);
  19:     (16.0,57.80104059);
  20:     (17.0,57.79730541);
  21:     (18.0,72.22923153);
  22:     (19.0,57.63875286);
  23:     (20.0,75.45859069);
  24:     (21.0,70.21595096)]
  25:  
  26:let resultSimple = RegressionSimple samples
  27:   Assert.AreApproximatelyEqual (10.0, resultSimple.c, 3.0)
  28:   Assert.AreApproximatelyEqual (3.0, resultSimple.m, 0.2)
  29:  
  30:let resultWithLoop = regressionWithLoop samples
  31:   Assert.AreApproximatelyEqual (10.0, resultWithLoop.c, 3.0)
  32:   Assert.AreApproximatelyEqual (3.0, resultWithLoop.m, 0.2)
  33:  
  34:let resultFast = regressionFast samples
  35:   Assert.AreApproximatelyEqual (10.0, resultFast.c, 3.0)
  36:   Assert.AreApproximatelyEqual (3.0, resultFast.m, 0.2)
  37:

Summary

F# allows the developer to write safe code using recursion and the compiler will optimise the code into a form similar to a hand crafted while loop.

Visual F# in VS 2010

28. May 2009 23:55 by David in

I have finally got around to installing VS 2010 Betaand debugging with F# ACTUALLY works!  Six months of having to add variables to the watch window, instead of being able to move my mouse over the variable declaration was getting very tiring…  At the moment the F#/IDE integration story is very poor and makes it hard to justify using F# for simple problems that can be just as easily solved using C#.  That excuse thankfully no longer  applies.

I loaded up a couple of the F# projects from work and they all compile/pass our unit tests which as you can imagine was a great relief.  It appears as though there has been a “big” cleanup around naming conventions within the core F# libraries and many of the methods that we were leveraging have now been marked as obsolete.    Thankfully Microsoft decided to “obsolete” the methods and not remove the old functions entirely.

Opensource .NET Exchange III Lineup

28. May 2009 23:13 by David in

Gojko has announced the lineup for the 3rd Open Source.NET Exchange 

The speaker list is as follows:

  • Ian Cooper: A First Look at Boo
  • Dylan Beattie: Managing Websites with Web Platform Installer and msdeploy
  • Scott Cowan: Spark View Engine
  • David: Introduction to MPI.NET
  • Gojko Adzic: Acceptance testing in English with Concordion .NET
  • Sebastien Lambla: What OpenRasta does other frameworks can’t
  • Phil Trelford: F# Units of Measure

This time I will be introducing MPI.NET and covering many of the topics that I have been blogging about over the last few weeks.  I’ve decided to do all the examples/slides using C# as opposed to F# which I will continue to cover in my blog posts.

Personally I am looking forward to seeing the session on Boo and the improvement Sebastien has made to OpenRasta.

MPI.NET – Distributed Computations with the Message Passing Interface in F# – Part 2

17. May 2009 12:17 by David in

In a previous post I described that it was possible to calculate PI using Monte Carlo

Using the same technique it is possible to price financial products such as insurance.  Customers pay a premium for heath cover.  If they get sick the insurance company is then obligated to pay any medical costs that “may” occur.  This means that the customer has a fixed “known” upfront cost meanwhile the insurance company’s possible costs range from almost nothing to that of an extremely expensive medical bill resulting from a surgical procedure.  It is possible to run thousands of “What if scenarios” and use the results to estimate the amount of capital that is needed to cover the costs for all of the customers at the organisation.

Another example is pricing an Option which is a simple version of Insurance and is used to lock in the price of a product that the company wants to buy or sell at a future date. 

Buying an option – going Long:

Buyer has no obligation to exercise the Option however the seller of the Option is legally obligated to provide the product at the price indicated in the contract.  Buyer pays a premium to the seller.

  • Call - Buyer assumes the price is going to rise and wants to lock in the price – Airline buys an option to lock in the price of fuel for its fleet as it fears that prices will rise in 6 months. If the price drops in six months the Airline buys Oil from the market at the cheaper price.
  • Put – Buyer assumes that the price is going to fall and wants to lock in the price – Oil manufacturer believes prices will be lower in 6 months as the economy is slowing and fears that oil prices will fall.  If the oil price increases the manufacturer sells the oil at the higher market price.

Selling an option – going Short:

Seller is obligated to provide the product at the price indicated in the contract if the buyer exercises the contract.  An exercised contract is a loss for the seller, in comparison, if the contract is not exercised the Seller makes a profit in the amount of the Premium.

  • Call – Seller assumes the price is going to fall
  • Put – Seller assumes that the price is going to rise

Simulating a European Option

The Financial Numerical Recipes in C++web site includes a number of tutorials using C++ to calculate Bond Prices, Option Prices etc.  I recently have been porting the code snippets to F# to gain more familiarity with the language.

The C++ code to simulate the price of an Option is here

The first part of the simulation is randomly generate what the price will be in the future.  The value has an equal probability of being higher or lower than the starting price.  The inputs to the simulation are:

  • Current Value (S)
  • Interest Rate (r)– Since we are simulating the price in the future we want to convert the price back into todays money
  • Time (time) – Duration to the contract’s exercise date
  • Volatility (sigma) – The magnitude of the random movements, at each point in time, that the price is expected to have – Best guess based on historical data and is the most problematic and difficult part of pricing Options. 
    • A low volatility implies the the final price WILL NOT have diverged far from the current value S
    • A high volatility implies the the final price WILL have diverged far from the current value S
  • Type of Walk – Stock prices are assumed to be “lognormal walk” which means that each price movement is a percentage change
    • S1 = S * 0.1 * randomChoiceOf(1 or –1)
    • S2 = S1 * 0.1 * randomChoiceOf(1 or –1)

The C++ simulation uses a library to generate a random normal distribution which in turn is used to create the lognormal walk.  Meanwhile the System.Random object in .NET provides random numbers where the generated values are equally spread across the range 0 to 1.  A random normal distribution meanwhile generates values that follow a Gaussian distribution with the “mean” being zero and the shape looking like a bell curve.   While it is easy to create a method that will generate a normal distribution the excellent Math.NET Project provides this capability all ready.

let logNormalRandom = new MathNet.Numerics.Distributions.NormalDistribution()
let next = logNormalRandom.NextDouble()

This leads to the following ported code:

   1: let R = (r - (0.5 * Math.Pow(sigma, 2.0))) * time
   2: let SD = sigma * Math.Sqrt(time)
   3: let FuturePrice = S * Math.Exp(R + SD * logNormalRandom.NextDouble())

The code above returns what the future price will be for a particular simulation.  Following the explanation of Options above the buyer only exercises the option if it will make a profit over buying the product directory from the market (In the Money) and programatically is as follows:

   1: let europe_call_payoff price exercise =  Math.Max(0.0, price - exercise)
   2: let europe_put_payoff price exercise =  Math.Max(0.0, exercise - price)

The final code is here

   1: #light
   2:  
   3: open System
   4:  
   5: // Financial Numerical Recipes in C
   6: // http://finance-old.bi.no/~bernt/gcc_prog/recipes/recipes/recipes.html
   7: let europe_call_payoff price exercise =  Math.Max(0.0, price - exercise)
   8: let europe_put_payoff price exercise =  Math.Max(0.0, exercise - price)
   9:  
  10: let option_price_call_european S X r sigma time payoff sims =
  11:let logNormalRandom = new MathNet.Numerics.Distributions.NormalDistribution()
  12:  
  13:let R = (r - (0.5 * Math.Pow(sigma, 2.0))) * time
  14:let SD = sigma * Math.Sqrt(time)
  15:  
  16:let option_price_simulation()  =
  17:let S_T = S * Math.Exp(R + SD * logNormalRandom.NextDouble())
  18:     payoff S_T X
  19:
  20:let rec futureValueIter i value = 
  21:     match i with
  22:     |0 -> value + option_price_simulation()
  23:     |_ -> futureValueIter (i-1) (option_price_simulation() + value)
  24:
  25:let futureValue = futureValueIter sims 0.0
  26:   System.Math.Exp(-r * time) * (futureValue / (double)sims)

And the test

   1: #light
   2:  
   3: open MbUnit.Framework
   4: open OptionPricingModel
   5:
   6: [<Test>]
   7: let simulate_call_option() = 
   8:let result = option_price_call_european 100.0 100.0 0.1 0.25 1.0 europe_call_payoff 500000
   9:   Assert.AreApproximatelyEqual(14.995, result, 0.03)

Once again the simulation is “close” to the correct value in this case within 3%.  The C++ code shows techniques to improve the accuracy of the simulation which I will do a in a future post and at the same time host the simulation within MPI.NET.

MPI.NET – Distributed Computations with the Message Passing Interface in F# – Part 1

6. May 2009 18:39 by David in

Supercomputing has officially reached the desktop it is now possible buy a Linux or Microsoft based cluster for a few thousand dollar.  Further compute clusters can easily be spun up in the cloud where they can perform some work and then by switched off once the CPU intensive task has been completed.  One of the great benefits of this migration from big iron to clusters of affordable commodity hardware is that the software libraries that were designed to help scientists to predict the weather or to find the correlation between diseases and genes are now available to use in line of business applications.

Desktop versus Applications Server versus Compute cluster

The main differences between the three main computer architectures (Desktop, Application Server and Compute Clusters) is no longer based on hardware differences (vector computers such as those from Cray are being slowly replaced with Intel/AMD x64 machines) but usage scenarios. 

Computer ArchitectureUsage
Desktop
  • General purpose computer
  • Multiple applications running concurrently
  • Optimised for GUI feedback and response
Application Server
  • Dedicated to run a particular program
  • Long uptime requirements
  • Optimised for network
Compute Cluster/Supercomputer
  • Batch processing – Cluster is dedicated to work on a problem
  • Typically low or no external input once problem starts
  • Single Program Multiple Data type problems
  • Often collaboration between nodes is more complex than problem that is being solved

Supercomputers are typically batch processing devices.  A university or government department that has spent millions of dollars on their cluster needs to ensure that the infrastructure is fully utilised throughout its lifetime (24x7x365).  This typically achieved by scheduling jobs weeks in advance and making sure that there are no periods where the cluster is idle.  Often when work is not completed within its allocated scheduled period the process are automatically killed so that the next job can execute.  Since the cluster is designed to run different pieces of software and there might be hundreds of servers involved the concept of a “Single System Image” becomes important where software that is “deployed” to the cluster is seamlessly deployed onto all the nodes within the cluster.

Data Sharing

There are two basic methods for sharing information between nodes within a cluster:

  • Shared Memory Model – Combine all the memory of all of the machines in the cluster into a single logical unit so that processes from any of the the machines is able to access the shared data on any of the other machines.
  • Message Passing Model – Use messaging to pass data between nodes.

Since the shared memory model is similar to software runs on a local machine it is very familiar to develop against.  Unfortunately while accessing the data is transparent the actual time to load data off the network is far slower than reading from local memory.  Further data contention can arise with different servers in the cluster trying to Update the same logical memory location.  For this reason Message Passing has gained prominence and the Message Passing Interface protocol has become a standard across the super computing industry.

MPI.NET

MPI.NET allows the .NET developer to hook into the MPI implementation by Microsoft and thus can be used to have code running on a cluster.

MPI.NET Installation steps

  1. Install the Microsoft implementation of MPI - download
  2. Install the MPI.NET SDK – download

Converting the Monte Carlo F# PI example to use MPI.NET

The following is the original F# code to calculate PI:

   1: let inUnitCircle (r:Random)  =
   2:let y = 0.5 - r.NextDouble()
   3:let x = 0.5 - r.NextDouble()
   4:
   5:   match y * y + x * x <= 0.25 with
   6:     |true -> 1.0
   7:     |_ -> 0.0
   8:  
   9: let calculate_pi_using_monte_carlo(numInterations:int)(r:Random) = 
  10:let numInCircle = List.sum_by (fun f -> (inUnitCircle r))[1 .. numInterations] 
  11:   4.0 * numInCircle / (float)numInterations 

To calculate PI after a 1000 tests have been executed calculate_pi_using_monte_carlo 1000 r.

To speed up the calculation process we want the F# code to run on our cluster.

Steps:

  1. Reference the MPI assembly from the GAC
  2. Pass the command line arguments to MPI
       1: let args = Sys.argv
       2:  
       3: using (new MPI.Environment(ref args)) (fun environment ->
       4:     // Code here
       5: )
  3. When MPI starts it gives each node an ID – Usually Node 0 is used for communication and the other nodes are used for processing.
  4. We want to insert the call to calculate PI at line 4 however once each node has completed the calculation the result needs to be passed back to Node 0 so that it can be combined with the other results at the same location
  5. Determine the Node ID
       1: using (new MPI.Environment(ref args)) (fun environment ->
       2:let comm = Communicator.world
       3:let nodeId = comm.Rank
       4: )
  6. Use the Node Id to seed the Random method
  7. Use the Reduce method to retrieve the results of each different cluster instance and return that value back to the client
       1: using (new MPI.Environment(ref args)) (fun environment ->
       2:let comm = Communicator.world
       3:let seed = DateTime.Now.AddDays((float)comm.Rank)
       4:let r = new Random((int)seed.Ticks)
       5:let pi:double = comm.Reduce(calculate_pi_using_monte_carlo 1000 r, Operation<double>.Add, 0)  / (double)comm.Size
       6:if (comm.Rank = 0) then
       7:     (
       8:       Console.WriteLine("Pi " + pi.ToString())
       9:     ) 
      10: )
  8. Execute the code
  9. “c:\Program Files\Microsoft HPC Pack 2008 SDK\Bin\mpiexec.exe" -n 15 PebbleSteps.MPI.exe
  10. MPI then spins up 15 processes and runs the F# application within each process and provides the environment settings so that MPI.NET can determine what each Nodes ID is.
  11. image
  12. The program will finally display the extracted value of PI

Behavior Driven Development with NBehave

26. April 2009 23:06 by David in

Last Wednesday I presented a talk on BDD for Skillsmatter.  The talk was a great opportunity to drill into my own assumptions about using TDD (Test Driven Development) and TFD (Test First Development). 

 

What is BDD

BDD is a fairly controversial subject as it promotes a Top Down approach to agile software development as opposed to the bottom up approach that is described by Test First Development.  For example TFD by its very nature advocates a very tight iterative approach of defining a methods output, implementing code that passes the test and finally refactoring to cleanup the developed code.

image

This approach is fantastic for developing well understood components.  Unfortunately it can become very difficult to build complex systems using TFD.  Often team members will write tests AFTER the core business logic has been written (Green, Red, Green or just Green).  Sometimes the problem space is not well understood and by using TFD the development effort is pushing towards writing code and not designing the system.  The debate also tends to center around the number of tests that have been written as opposed to how useful the developed software actually is.  Hence TFD can become biased towards low level coding practice as opposed to system wide design and elegance.

BDD tries to alleviate the situation by swapping the coding process to a classic Top Down or Step Wise Refinement approach.  Instead of trying to write production code from the start BDD advocates iteratively defining the API and then once the API is stable writing the actual code that implements the API.  The output of TDD are a suit of tests.  The output of BDD is a specification for a system’s API and a set of contracts that validate that the API has been implemented.

image

BDD can be summarised as:

  • A formalised template for expressing User Stories
  • A methodology that promotes “security” to being first call citizen in the analysis process
  • Promoting Domain Driven Design is one of the best methods for building enterprise applications
  • Advocating API validation specification based tests as opposed to classic TFD

Key to the success of the BDD approach is the formalisation of the humble User Story.

 

As a [User/Role]
I want [Behaviour]
so that [I receive benefit]

 

NBehave in turn takes the BDD template and converts it into a Fluent API

   1: var story = new Story("Create portfolio");
   2:  story.AsA(“Operations team member")
   3:     .IWant("To create a new portfolio on behalf of a portfolio manager")
   4:     .SoThat("the portfolio manager can configure the portfolio and the front office can trade"); 

For each story the developer and analyst creates a suite of scenarios that explore both common and edge cases.

 

Given [some initial context]
When [an event occurs]
then [ensure some outcomes]

 

Once in NBehave this becomes

   1: story.WithScenario("portfolio does not exist")
   2:     .Given("portfolio name is $name", "Aggressive Fund")
   3:     .When("portfolio does not exist in the database")
   4:     .Then("new portfolio should be created");

At that point mocking allows the developer to iteratively define how the API should respond to external input.  Mocks/Stubs are used to allow the developer to experiment with the API.  Core components are mocked as opposed to using mocking to emulate dependencies.

image 

In the following example NBehave is used to aid in the design of the IPortfolioRepository and IPortfolioService interface.  No production code is executed in the test.  Instead the test is used to allow the developer to decide:

  • What services should core systems provide
  • What properties/methods should core domain entities have
   1: string portfolioName = "";
   2: Portfolio p = null;
   3: var portfolioRepository =  MockRepository.GenerateStub<IPortfolioRepository>();
   4: var portfolioService = MockRepository.GenerateStub<IPortfolioService>();
   5:  
   6: s.WithScenario("portfolio already exists")
   7:     .Given("portfolio name is $name", "Aggressive Fund",   n => {
   8:         portfolioName = n;
   9:         p = new Portfolio { Name = portfolioName };
  10:         portfolioRepository.
  11:             Stub(x => x.FindPortfolioByName(portfolioName)).Return(p);
  12:         portfolioService.
  13:             Stub(X => X.CreatePortfolio(p)).Throw(new                         ItemExistsException());
  14:         })
  15:  
  16: .When("portfolio already exists in database",
  17:      () => Assert.IsNotNull(
  18:         portfolioRepository.FindPortfolioByName(portfolioName))
  19:     )
  20:  
  21: .Then("new portfolio create should fail",
  22:     () => Assert.Throws<ItemExistsException>(() =>                     portfolioService.CreatePortfolio(p))
  23:     );

 

Steps to generate an API using NBehave

  1. Go through the User story and find all the domain objects
  2. Create a class for each domain object but don’t add state
  3. Go through the User story and create a find all the services that are required (Repositories etc)
  4. Create an interface for each service but don’t add any methods
  5. Slowly implement the story
  6. Add domain object properties as required
  7. Add methods signatures as required
  8. Use Stubs to explore Inputs/Outputs to the methods


Skillsmatter NBehave talk

29. March 2009 11:51 by David in

I will be doing a talk on Behavior Driven Development using NBehave at Skills Matter on April 15.  As yet the registration page doesn’t look as though its been put up.  However Gojko as always has put a link on UK .NET.

http://ukdotnet.ning.com/events/inthebrain-opensource-net-1

I will post an outline of the talk next week when I have fleshed in out further…

Calculating PI using the Monte Carlo method using F#

29. March 2009 11:09 by David in

Earlier in my career I spent a number of years designing electronic circuits using circuit simulation software.  One of the most important steps within the design process is to determine how robust the circuit will be to different scenarios such as temperature changes or to the spread in electrical characteristics in physical components (ie 100 ohms +/- 5%).  One useful design technique is Monte Carlo analysis where all of the simulation parameters such as component values and temperature are randomly changed by a small value and the circuit simulation is run.  The process repeats thousands of times.  The designer is able to use the results to hone into any areas within the design that unstable and either change the design or use more expensive components that have a tighter tolerance.

Monte Carlo is used:

  • In finance to complex products such as options
  • In mathematics to perform numerical analysis

Monte Carlo can be used to calculate PI

We can calculate PI by using knowing how to calculate the area of a square and a circle as follows

  1. To calculate the area of a Square {S = Width*Width}
  2. To calculate the are of a Circle {C = PI*Radius*Radius}
  3. If we place a circle inside a square Width = 1, Diameter = 1, Radius = 0.5 
  4. Therefore S = 1
  5. Therefore C = PI * 0.5 * 0.5 = PI * 0.25 = PI/4
  6. The Ratio is area between the Circle and the Square is Ratio = C/S = PI/4/1 = PI/4
  7. Hence PI = Ratio * 4

So now we can find PI by knowing there is a relationship between the area of the circle and a square.  Using Monte Carlo Analysis we randomly pick points within the square and check if they also fall into the circle.

  1. Randomly select a Point in the Square P = (X, Y) where X is between 0..1 AND Y is between 0..1
  2. Using the Pythagorean theorem P is in the circle if X*X + Y*Y <= R*R; X*X + Y*Y <= 0.25
  3. Ratio = Points in Circle/All Points

Here is the code

   1: let r = new Random()
   2:  
   3: let inUnitCircle _ =
   4:let y = 0.5 - r.NextDouble()
   5:let x = 0.5 - r.NextDouble()
   6:   match y * y + x * x <= 0.25 with
   7:     |true -> 1
   8:     |_ -> 0
   9:  
  10: let rec calculate_pi_using_monte_carlo(numInterations:int)= 
  11:let numInCircle = List.sum_by (inUnitCircle)[1 .. numInterations]
  12:   4.0 * (double)numInCircle / (double)numInterations 

The unit test is utilises the very useful AreApproximatelyEqual method from mbUnit.  As the number of iterations (random samples) increases the calculated PI starts moving closer and closer to the that of Math.PI.

   1: [<Test>]
   2: let row() =
   3:let rowVal = [(100, 0.2); (1000,0.07); (3000,0.03)]
   4:   List.iter (fun (i, d)-> Assert.AreApproximatelyEqual (Math.PI, calculate_pi_using_monte_carlo i, d)) rowVal

Unfortunately I was unable to use  [<Row([|(100, 0.2); (1000,0.07); (3000,0.03)|])>] syntax to utilise the Row attribute within F# since the compiler seems to be unable to create arrays of type object.  I would be very interested if anyone has been unable to get it to work.

Introduction to PostSharp and AOP slides and code

25. January 2009 12:30 by David in

On Thursday I participated in the Open Source .NET Exchange held by Skills Matter

It was a great night.  The venue was full and I and the other speakers received lots of positive feedback (although my microphone was not on correctly so I ended up shouting…).  As you can imagine, during a 15 minute speech, you can only scratch the surface of a topic.  However, with the talks being so diverse, I think most people, including myself, learnt a lot.

I especially liked Mike’s talk on the repository pattern.  Comparing and contrasting different implementations of a complex pattern/technique is a powerful method to understand the inevitable hidden intricacies.  His advice about being weary of using a single generic repository interface or base class for all repository’s was very timely.  In some code, I recently reviewed, the developer using NHibernate to populate a read only report.  Unfortunately within the project wide repository there were save and delete methods available.  Clearly this throws out the underlying read only nature of the data out the window.  Following Mike’s advice our team will be moving away from trying to be overly clever, with the design, of our repositories and move instead to ones that clearly identify the base aggregates and clearly expose the expected behavior of the root object at run time.

Interestingly I spent more time discussing the features of Active MQ with David and a group of people interested in using NMS than PostSharp.  David and I worked on a project where we used the Publish-Subscribe capabilities of Active MQ to push data to clients that were running Flash/ActionScript.  Using Push technology, as opposed to Pulling from database, reduces load and increases scalability.  We are using the same technique at a large finance company to push market prices directly to client machines running WPF.  In the .NET community people it is common for MSMQ to be used as the messaging infrastructure.  This choice heavily limits the design. 

MSMQ:

  • does not provide Publish-Subscribe capability – Although it can be emulated with NServiceBus/Mass Transit
  • does not allow you to tag messages with metadata which can then be used for intelligent filtering/routing
  • has a 4 MB limit on message size

On the PostSharp front there was a lot of interest in using PostSharp to “break the build” since its such a quick win for teams that are using NHibernate. I will try to blog about this technique in a little bit more detail soon.

 

Slides and Source

As promised here are the slides and the source code that were used during the presentation. 

To use the code download and install PostSharp using the default settings.  This will configure MS Build to automatically invoke PostSharp whenever the PostSharp.Laos.dll assembly is referenced.

Test Driven Development - Tools and Techniques

18. December 2008 00:26 by David in

Earlier tonight Chris and I presented our talk on Test Driven Development.  I have to say I was rather impressed at the turn out since its only a week away from Christmas and we still managed around 40 people.  As promised here are the slides from the presentation and all of the code/files that were used during the talk. 

Test_Infected_Presentation.ppt (544.50 kb) 

SkillsMatter.TestInfected.rar (6.15 mb) 

Running Fitnesse

  1. Uncompress the SkillsMatter.TestInfected.rar file into C:\SkillsMatter\TDDTalk
  2. Run C:\SkillsMatter\TDDTalk\TDD\fitnesse\run.bat - This will start the Fitnesse web server on port 8080
  3. load http://localhost:8080/MidlandsFoodsFirstCut
  4. Press the Suite link and the tests should execute and all fail - This demonstrates the deliverable from a Business Analyst
  5. Load http://localhost:8080/MidlandsFoods 
  6. Press the Suite link and the tests should execute and most will pass
Debugging Fitnesse
  1. Uncompress the SkillsMatter.TestInfected.rar file into C:\SkillsMatter\TDDTalk
  2. Run C:\SkillsMatter\TDDTalk\TDD\fitnesse\run.bat - This will start the Fitnesse web server on port 8080
  3. Load C:\SkillsMatter\TDDTalk\TDD\JackPlaysSnap\JackPlaysSnap.sln
  4. Set the Fitnesse project as the StartUp Project
  5. Under project propertie
    1. Set Start external program to C:\SkillsMatter\TDDTalk\TDD\fitnesse\dotnet\TestRunner.exe
    2. Set the Commandline arguements to localhost 8080 MidlandsFoods.GiantWinsJackNeverSnaps
    3. Set the Working directory to C:\SkillsMatter\TDDTalk\TDD\fitnesse\
  6. Set a breakpoint in JackPlaysSnapFitness.cs
  7. Run the project under debug mode