**lederhosen**

Been meaning to post this one for a while (I didn't forget and post it already, did I?) It caught my attention because the end result is very similar to some of the stuff that came up in my work last year, though for somewhat different reasons.

Strong profiling is not mathematically optimal for discovering rare malfeasors.

The result, in a nutshell: suppose you're looking for Bad People via random screening, and you know that certain types of people (nationality, age, gender, whatever) are ten times more likely to be malfeasors than other people.

The natural response would be to screen the first type of person ten times as often ('strong profiling')... but it turns out that this is actually no more efficient that screening everybody at random with the same probability. When malfeasors do fit the stereotype, strong profiling lets you find them slightly faster; when they don't, strong profiling takes a lot longer to find them, because your efforts are concentrated too much on the wrong people. It turns out that the optimal solution here (ignoring moral considerations and the possibility that people will change behaviour in response to your strategy) is somewhere in between the two - 'weak profiling'.

(I should add that this result is for one specific formulation, as described in the paper; it could easily be argued that some scenarios don't fit this particular mathematical model. But it's still interesting as an illustration that putting too much faith in your information can be a bad idea.)

Strong profiling is not mathematically optimal for discovering rare malfeasors.

The result, in a nutshell: suppose you're looking for Bad People via random screening, and you know that certain types of people (nationality, age, gender, whatever) are ten times more likely to be malfeasors than other people.

The natural response would be to screen the first type of person ten times as often ('strong profiling')... but it turns out that this is actually no more efficient that screening everybody at random with the same probability. When malfeasors do fit the stereotype, strong profiling lets you find them slightly faster; when they don't, strong profiling takes a lot longer to find them, because your efforts are concentrated too much on the wrong people. It turns out that the optimal solution here (ignoring moral considerations and the possibility that people will change behaviour in response to your strategy) is somewhere in between the two - 'weak profiling'.

(I should add that this result is for one specific formulation, as described in the paper; it could easily be argued that some scenarios don't fit this particular mathematical model. But it's still interesting as an illustration that putting too much faith in your information can be a bad idea.)

## no subject

Date: 2009-04-02 02:09 pm (UTC)nefaria.livejournal.comHowever, I would argue that in the specific example case, airplane terrorism, the chance that perpetrators would fit the profiles are extremely high. The square-root weak screening would be closer to profile screening than random screening in this case. Also, I assume they keep lists of people who meet the profile but come out squeaky clean, so there wouldn't be endless retests of those who are not guilty.

To keep the math pure, I think he neglected some of the real-world factors that would affect the search.

## no subject

Date: 2009-04-03 06:54 am (UTC)lederhosen.livejournal.comAlso, I assume they keep lists of people who meet the profile but come out squeaky clean, so there wouldn't be endless retests of those who are not guilty.I'd be surprised, honestly. Lists like that tend to be problematic (names are not unique identifiers - I'm a little bitter about this at the moment, but that's another post) - the current no-fly list has generated plenty of embarrassments already. Even if you had good and usable records, it's dangerous to assume that somebody who screened clear last time around is safe this time (especially if they ever find that out). Anecdotally, I have several friends who get screened every time they fly.

So I don't think the 'memoryless' assumption is too much of a problem here. IMHO, the bigger problem with the terrorism parallel is that the question they're answering is "how many searches does it take to find all of a known number of terrorists?" when the RL question is probably more along the lines of "given N searches, how should we allocate them to find as many terrorists as possible?"

I haven't done the maths, but I suspect the latter question is a bit more favourable to strong profiling (and if N is small enough, possibly over-strong).

As with a lot of mathematical models, I think it's better treated as a demonstration of

possibleproblems than hard proof of the best way to do things. (And as acknowledged in the paper, there are also ethical considerations and second-order effects at work.)