Random distributions, part 2
Feb. 8th, 2007 11:56 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Explaining the options from yesterday's poll:
There are plenty of sources of 'randomness' in life, but not all randomness is the same. If I asked you to measure the height of the next policeman you meet, you'd expect to get something between five and seven feet tall, and most likely around the middle of that range. If I ask how many people are in my kitchen right now, you'd expect there to be no more than half a dozen, and most probably none at all at any given moment.
So mathematicians talk about 'random distributions'. A random distribution is a way of describing the range of possible outcomes and how likely each of them is. One of the most important challenges in science is looking at a handful of 'random numbers' taken from some distribution, and trying to figure out the overall shape of the distribution that produced them.
Mean, Standard Deviation, and Variance
Probably the most important characteristic of a distribution is the mean (average), often represented with a μ (mu). A waiter who makes a total of $500 in tips over two weeks' work is getting an average of $50 a night, and for long-term budgeting that's all he needs to know.
But the average doesn't tell the whole story. If that waiter needs $50 by tomorrow morning to pay his bills, he's not guaranteed to get it, because that $50-a-night is only an average; on any given night he might make $45, he might make $68. To deal with problems like that, we need to look at the standard deviation ('σ', sigma), which describes how widely the results diverge from the average.
If the waiter finds that the standard deviation in his tips is $10, he can expect that about two-thirds of the time* his tips will fall between $40 and $60 (μ-σ to μ+σ). And 95% of the time, it'll fall between $30 and $70 (μ-2σ to μ+2σ).
For various technical reasons, mathematicians often prefer to talk about the variance, which is simply the square of the standard deviation (σ2). In this particular example, the waiter's variance is 100 square dollars.
Discrete and continuous distributions
Whatever the number of people in my kitchen is, it'll be an integer (whole number). You can have two people, or three, but you can't have two-and-a-half. This is what's known as a discrete distribution - the possible outcomes are discrete, separate things, and between them lie impossibilities. If you have only a finite number of possible outcomes, or at least countably many, that's discrete. (Anything limited to integer outcomes is countable.)
But what about height? A policeman could be exactly 5'8" or 5'9", but he could just as well be any height in between those two. This is what we call a continuous distribution.
In our discrete example, outcomes have probabilities greater than zero. The chance of having exactly 0 people in my kitchen at any given time might be 96%; maybe a 2% chance of having 1 person, 1% of having 2, 0.5% of having 3, and so on - the probability of there being exactly five people might be small, but it's bigger than zero.
But what are the odds that the policeman of your choice, at any given moment, is exactly six feet tall? There may well be moments when he is exactly 6'0", on his way between "just under" and "just over", but they don't last for any measurable amount of time; the probability that he'll be exactly 6'0" at the moment you measure him is nil.
Still, it seems intuitive that 6'0" is a more likely height for a policeman than 7'0", and it would be nice to have a mathematical way to represent this. The trick is to look at ranges. How many policemen between 5'11" and 6'1"? Maybe one in twenty. How many between 6'11" and 7'1"? Maybe one in two thousand. So a small range around 6'0" is a hundred times more likely than the same-size range around 7'0". This lets us talk about a probability density function - it represents not just the chance of getting exactly that height, but the chance of getting close to it.
If you look at the poll closely, you'll see that some of them use an upper-case P for probability and others use a lower-case p; the former are discrete distributions with non-zero probabilities, the latter are continuous distributions represented by density functions. With one sort-of-exception that I'll get to.
With that aside, on to...
The distributions
P(k) = 1/n for integer k between 1 and n, 0 otherwise.
Ever played with dice? Then you've worked with this one. If we try, say, n=6, this translates to "a one-in-six chance of getting any whole number between 1 and 6".
P(k) = (nCk)*(p^k)*(1-p)^(n-k)
This is the binomial distribution. If you're playing Warhammer and want to know how many 6s you'll get when you roll five dice, or you have ten employees at your company and you want to know how many will be off sick tomorrow, this is the formula you'd use.**
P(k) = exp(-λ)*(λ^k)/k!
My favourite: the Poisson distribution. If you manufacture a million ball-bearings and you want to know how many are defective, if you run a shop and want to know how many customers you'll get in the next hour, or if you have a thousand employees and you want to know how many will be off sick tomorrow, the Poisson distribution is your friend.
The extra-nice thing about the Poisson distribution is that the variance is equal to the mean (and so the standard deviation is the square root of the mean). If the waiter I described earlier (non-Poisson distribution) tells me that he made $500 in tips last fortnight, I have no clue how variable that is - I can't tell him whether he's guaranteed to make bus fare tomorrow. But if last week's batch of a million bearings produced 100 defectives, I can be 99% certain that next week's batch will have more than 70 and less than 130 (μ-3σ to μ+3σ) as long as the process hasn't changed.
p(x) = exp(-(x-μ)^2/(2*σ^2))/(σ*sqrt(2*pi))
Probably the best-known probability function, the normal distribution (aka 'Gaussian' or 'bell curve'). If you measure adults' heights, you'll get something like a normal curve.
p(x;α,β) = x^(α-1)*(1-x)^(β-1)/B(α,β)
The beta distribution, a cousin to the binomial. To be honest, I haven't used this one a lot myself, but a biologist friend of mine does; it's useful when you're trying to figure out the shape of a binomial from random data.
p(x) = δ(x-μ)
The Dirac delta function***. This is a sneaky way of representing what's really a discrete probability function representing a constant ("P(x)=1 for x=μ, 0 otherwise" - which really just says "x=μ with 100% probability") as if it were a probability density function. A lot of physics is based on density functions, so it can be very useful to have a way of dressing up other functions as density ones.
*This is an approximation, which works very well for a lot of 'typical' random distributions but not for all of them. An unusually-shaped distribution might be less reliable than these numbers suggest; for something more bullet-proof, Chebyshev's inequality guarantees that it'll fall within $30 and $70 at least 3/4 of the time, between $20 and $80 at least 8/9 of the time, between $10 and $90 at least 15/16 of the time, and so on.
**It wouldn't be an exact model of illness, since employees getting sick aren't entirely independent of one another, but it's a reasonable place to start.
***Technically, this is not actually a 'function' as mathematicians define them.
no subject
Date: 2007-02-09 12:33 am (UTC)no subject
Date: 2007-02-09 09:09 am (UTC)no subject
Date: 2007-02-09 10:04 am (UTC)