lederhosen: (Default)
[personal profile] lederhosen
One of the standard techniques throughout science (well, a whole family of techniques really) is 'fitting'.

The way it goes is like this: we have something that we consider important (cost, blood pressure, whatever) that's affected by a whole bunch of other things (how far we drive, how many people we employ, how much the patient gets of what drugs - we call these "predictive variables").

The relationship is complex enough that we can't just predict it from first principles, so what we do instead is we take a bunch of observations in which we record both the predictive variables and the results. Then we try to come up with a mathematical model that matches our observations.

For instance: suppose I go to the shops and buy myself an apple and an orange. The receipt isn't itemised, but the total cost is $1.50. Next day I go back, and buy myself an apple and two oranges, and it costs me $2.00.

From this, you might reasonably conclude that an apple costs $1.00 and an orange costs 50 cents. If I was buying a dozen different things every trip, you'd need more observations before you could figure out prices (at least one for each predictive variable we're considering). You would probably also want to use a computer to work the numbers, but there are plenty of packages that will do that - it's really quite easy to do, which is part of why it's such a popular technique.

Unfortunately, like many easy things, it's a little too good to be true. The catch is that while what we want to do is predict the future, what we're really doing here is describing the past. Up to a certain point, that's useful - we can expect the future will behave something like the past. But it's possible to overdescribe the past...

On Monday, I go to a new greengrocer and buy myself an apple and ten oranges, for a total of $6.00. The next day I buy myself an apple and *nine* oranges... and it costs me $6.10. What do you make of this?

Under the 'fitting' approach, there's only one way to interpret this: apples cost $7.00, and oranges cost minus ten cents each. WTF?

In fact, the explanation is much more reasonable: apples still cost about $1.00 each, and oranges cost about 50 cents. But this relationship isn't exact - oranges are actually sold by weight, and the ones I bought on Tuesday were a little heavier, so they cost a little more - about 56 cents each.

The problem here is that we forgot that the relationship between predictive and output variables is not exact - we don't know all the factors that affect our output variable. The more predictive variables we have, the easier it is to mistakenly credit them with effects that are really due to factors outside our knowledge. (You can see a LOT of this happening in electioneering or sports coverage - see Edward Tufte's debunking of 'belwether districts'.) Past a certain point, over-fitting becomes a sophisticated form of superstition - one to which scientists are especially susceptible.

(Jotting this down because I suspect I'm going to have to deal with these concepts at some length in the near future...)

Date: 2009-05-14 11:14 am (UTC)
From: [identity profile] nefaria.livejournal.com
> oranges cost minus ten cents each.

**blinkfluff** I'm coming to your grocer's and buying 10 million oranges!

Date: 2009-05-14 12:43 pm (UTC)
From: [identity profile] lederhosen.livejournal.com
Sadly, they only have negative oranges left.

Date: 2009-05-14 03:56 pm (UTC)
From: [identity profile] culfinriel.livejournal.com
This is an awesome post - I wish everyone playing in research and medicine understood this.

Also, if you could just make food have negative calories, you'd be golden.

Date: 2009-05-14 10:51 pm (UTC)
From: [identity profile] lederhosen.livejournal.com

Although the trick there is to make it interesting...

Date: 2009-05-17 07:13 am (UTC)
From: [identity profile] enegim.livejournal.com
Thank you! This was helpful.
(Now, any suggestions how to deal with people like the woman at my table at a reception last week? She was arguing strenuously that children should not receive certain vaccinations because those "cause autism"; when I said that in my understanding the reputable research indicated otherwise, she sniffed, "Oh, well, research...")
Edited Date: 2009-05-17 07:13 am (UTC)

Date: 2009-05-17 12:10 pm (UTC)
From: [identity profile] lederhosen.livejournal.com
Alas, no. I don't think logical argument gets very far there :-(

Date: 2009-05-17 12:15 pm (UTC)
From: [identity profile] enegim.livejournal.com
I was afraid of that. The context was, of course, that she had friends whose son was diagnosed as autistic a few months after being vaccinated. For some reason, it seems very difficult to convince people that individual anecdotes are not proof, and that correlation is not causation. :(

Date: 2009-05-18 09:06 am (UTC)
From: [identity profile] lederhosen.livejournal.com
It took me a long time to learn that rational thought really isn't the natural state of things. People can usually approximate it fairly well, but this is one of those exceptions.


lederhosen: (Default)

February 2017


Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 24th, 2017 08:03 am
Powered by Dreamwidth Studios