**
Calculating Probabilities in C#**

Wednesday, March 19, 2014

As a retired mathematician, it was pretty exciting for me to have the chance to do some (rudimentary) mathematical programming recently. In our new and improved search algorithm on HireSpace.com (see my blog on this here!) we take measures of quality, e.g. customer feedback, into account when ranking search results. Since we have a few different metrics with different ranges of values, we needed to normalise them first. We then wanted to assess the significance of different values.

Keeping things simple, we decided to assume all measures of quality are normally distributed (see definition on Wikipedia). I created a static class called ProbabilityUtils with everything you need to work out probability densities for normally distributed variables. This class is up on the Hire Space github account here, but here’s a walk through of what it contains and how it’s used.

Getting the mean of a list of values is easy, you just use the built in `.Average()` method. The first custom method we need then is one to work out the standard deviation:

public static double StandardDeviation(IEnumerable values) { double sd = 0; var enumerable = values as double[] ?? values.ToArray(); if (enumerable.Count() > 0) { double avg = enumerable.Average(); double sum = enumerable.Sum(d => Math.Pow(d - avg, 2)); sd = Math.Sqrt(sum / (enumerable.Count() - 1)); } return sd; }

Now to convert our normal distribution to a standard normal distribution we need a method to work out the Z value.

public static double Z(double score, double average, double standardDeviation) { if (standardDeviation == 0) return 0; return (score - average) / standardDeviation; }

The actual probability density function, for a standard normal distribution:

public static double StandardNormalPdf(double x) { var exponent = -1 * (0.5 * Math.Pow(x, 2)); var numerator = Math.Pow(Math.E, exponent); var denominator = Math.Sqrt(2 * Math.PI); return numerator / denominator; }

We need a method to work out the definite integral of a unary function between values a and b. To do this we use Simpson’s 3/8 approximation rule (see definition on Wikipedia)

public delegate double Function(double x); public static double Integral(Function f, double a, double b) { double multiplier = (b - a) / 8; double sum = multiplier * (f(a) + (3 * f(((2 * a) + b) / 3)) + (3 * f((a + (2 * b)) / 3)) + f(b)); return sum; }

Finally, a method to calculate the probability of getting a value less than x, given a standard normal distribution. Since we’re dealing with a normal distribution, exactly half of the values fall below the mean. So this method simply takes the integral between a value and the mean and adds 0.5.

public static double ProbabilityLessThanX(double x) { var integral = Integral(StandardNormalPdf, 0, x); return integral + 0.5; }

Ok, now we have all the tools we need to calculate some probabilities! Here’s a step by step example of how to use them:

1. Given a list of values (call this variable `values`), work out the mean and standard deviation using the above methods.

var mean = values.Average(); var sd = ProbabilityUtils.StandardDeviation(values);

2. Say this results in a mean of 4 and a standard deviation of 2, and we want to know how to assess a venue with a score of 5. We’ll then call the method to work out the z-value:

var z = ProbabilityUtils.Z(5, mean, sd);

3. Now plug this into our probability method to get p, the probability of getting a value less than 5.

p = ProbabilityLessThanX(z)

In this example, this probability is approximately 0.69. In other words, 69% of venues will get a score less than 5.

We’re still playing around with how to weight search rankings on hirespace.com, but having these tools is a good first step!

Hello,

thanks for this great tuturial but I have question.

I calculated the probability as follows:

var values = GetValues(); // Gets a list of 10 double values

var mean = values.Average();

var sd = ProbabilityUtils.StandardDeviation(values);

var z = ProbabilityUtils.Z(values.Last(), mean, sd);

1st variance as described here)

var p1 = ProbabilityUtils.ProbabilityLessThanX(z);

2nd variance as described at your Github site)

var p2 = ProbabilityUtils.ProbabilityLessThanX(values.Last(), mean, sd);

Result p1 differs extremely from p2. Could you please describe – what is right, what is wrong? Or if both variances are used when is what calculation used?

Thanks in advance!

Sebastian.

You’re right, the second function has a bug! It should just calculate the z value and pass it straight through to the former function. Thanks for pointing it out.

Hello! I’m a C# programmer and I am wondering if there is a way to return the probability that a given array of numbers is normally distributed. I found your blog while looking for an answer. I feel like the tools are staring me in the face, but I just can’t make the connection. Perhaps you could help? Thank you!

Hi Patrick, there are a few ways to judge whether a series is normally distributed. This is a helpful overview: https://statsthewayilikeit.com/about/is-my-data-normally-distributed/

A good formal test is the Shapiro-Wilks test: https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test

The easiest way to test this programmatically would be using a mathematical programming language like R which has these kinds of functions in its core!

hi.

Give me the source code, please