Monday, February 26, 2007

you do the math - no, wait, I've done it for you

Some months ago we were sitting around at work having a conversation about The National Lottery, probably bemoaning our failure to become millionaires the preceding weekend, again. My colleague Andy pointed out that the previous week had seen three numbers in the thirties (or the twenties, or forties, I can't remember) and christened this phenomenon "clumping", for want of a better word. Since then he's made our lives a constant misery by pointing out instances of "clumping" in each week's lottery numbers, as if this proves some sort of theory.

Strangely (or, in fact, clearly not) it's not a theory which has yet enabled him to predict the numbers before the draw is made - which is the only lottery-based trick worth doing - only to go "aha, clumping" afterwards, to everyone else's general annoyance.

All of which moved me to wonder what the probability of such an occurrence is. Clearly it's inevitable that one of the "decades" (as I'll call them from here on, even though one of them, the first one, 1-9, only contains nine numbers) will have at least two numbers drawn from it, as there are 6 numbers to be drawn, and only 5 decades. But what about three?

It's actually quite a complicated problem to state in mathematical terms; in addition, despite having a mathematics degree, I was always rubbish at probability calculations, which is why the Three Door Problem (known to the Americans as the Monty Hall problem, for reasons too tedious to go into here) still makes my brain hurt, even though I know the answer. My tip, if you're having trouble, is the same as tip 2 at the bottom of the page: imagine there are many many more doors and he opens all but one of them.

Anyway, I did what constipated mathematicians have done since time immemorial, i.e. sat down and worked it out with a pencil, boom boom. It pans out something like this:

Pick a random decade (a ten-number one for the moment). The probability of picking three numbers straight out of the hat/lottery machine/whatever which are from this decade is:

10/49 x 9/48 x 8/47 - i.e. once you've chosen the first one there are 48 balls left from which to choose one of the nine remaining numbers from your chosen decade, and so on and so forth. Let's call this number A.

But, of course, you don't have to pick the numbers straight away - you could pick a non-qualifying number in between. In fact you could end up not hitting the third number until the sixth ball. Better to deal with each of these cases individually. First, picking the third number as the fourth ball:

39/49 x 10/48 x 9/47 x 8/46 - i.e. one of the 39 non-qualifying numbers, then the three qualifying ones. The vital thing to realise here is that there's more than one way of doing this - the qualifying balls could be balls 1,2 and 4 or 1,3 and 4 or 2,3 and 4. Basically you need to ensure that the last qualifying ball is the last one you pick (i.e. it can't be 1,2 and 3 as we've already covered that one), and then it's just a case of picking 2 balls from 3 slots, which can be done in 3 ways. I won't list the combinations for 5 and 6 balls as it all gets a bit tedious; if you want to check my maths have a look around this area here. Let's call this number B.

Now, picking two qualifying and two non-qualifying balls, and then the third qualifying ball as the fifth ball (hopefully you're getting the idea by now):

39/49 x 38/48 x 10/47 x 9/46 x 8/45 - there are six different ways of doing this. Let's call this number C.

And finally picking the last qualifying ball as the sixth and last ball drawn:

39/49 x 38/48 x 37/47 x 10/46 x 9/45 x 8/44 - there are no less than ten different ways of doing this. Let's call this number D.

So, using the numbers above and the numbers of different combinations available, the probability of getting three numbers in our chosen decade is: A + 3B + 6C + 10D - I won't bore you with the intermediate working-out, but the answer comes to 0.09027 . Let's call this number E.

But of course that's the answer for a single decade (we're nearly there now, I promise!), and there are five of them. So the overall answer for any of the ten-number decades is just 4E, as there are four of them. As for the nine-number decade, you just have to substitute 9,8,7 for 10,9,8 in all the calculations above, which the more astute of you will spot just means the answer will be 7/10 of the value for the ten-number decades. So the overall figure, which we shall arbitrarily call P, is simply 4.7E, or, expressed as a percentage to two decimal places, which should be good enough for anyone (drum roll please):


By a strange but rather gratifying coincidence, Andy tells me three of the last seven main draws have featured "clumping". That would correspond to a probability of 42.86%, which is pretty damn close, particularly for such a small sample. QED. Quite Easily Done. Thank you and goodnight.


everlands said...

Before anybody asks, it's a different Andy [again], not me!

Bloody common name.

electrichalibut said...

Er, yeah, sorry. Should have made that clearer after last time.

everlands said...

But not the same different Andy, as far as we know. If you know what I mean.