Apr. 4th, 2009


Apr. 4th, 2009 09:28 am
lederhosen: (Default)
Last week: co-worker needed to join two datasets (staff experience to calls made by staff). Unfortunately the variable that SHOULD be the match variable (employee number) isn't 100% reliable - some people have numbers beginning '7595...' on one file and '7959...' on the other, that sort of thing. The other possible matching variable is name, but that's not 100% reliable either - there are people who are 'Anne' on one file and 'Patricia' on the other (legal name vs preferred name), variant spellings, name changes due to marriage and so on. We also have two employees with the exact same name.

So I set up a two-stage join: match by ID number, find unmatched records, match those by name, recombine the two files, etc etc. Messy and complicated, took quite some time to write and debug.

Then the other day, while poking around looking for something else, I discovered that I could have done it much more elegantly in a couple of lines. I'm using SAS EG, which has a point-and-click interface that allows SQL joins on equality etc; while I know how to write those in SQL for myself, I hadn't realised that I could also write a few things that weren't in the point-and-click options*. The trick is to write a fuzzy-logic join:

where (
(a.emp_id=b.emp_id)*3 >= &match_tolerance)

In practice, probably slightly fiddlier than that, but that's the basic idea. Wish I'd known earlier that I could do that.

Oh well, probably not the last time I'll need to do that, and I'm sure some of my co-workers can use this trick too. I'm trying to encourage them to shift to SQL joins instead of the SAS merge operation, because it seems to cause problems for anything other than one-to-one joins.

*My knowledge of SQL and SAS code is 'what I've picked up along the job'; while I know quite a few tricks, without formal training it's easy to miss important basics, as here.


Apr. 4th, 2009 09:17 pm
lederhosen: (Default)
Went to the climbing gym with Rey, [livejournal.com profile] silverblue, [livejournal.com profile] brandtotter, [livejournal.com profile] dilph, and non-LJ-person. Lots of fun! Hands and arms need a lot more strength (and my left hand needs to regrow a bit of skin), but definitely going back.

Dog-Or has discovered a new fun game that goes like this:

Dog: Hey, could you please open the door for me?
Me/Rey/[livejournal.com profile] lokicarbis: ...oh, okay. [door opens]
Dog: I changed my mind. I don't want to go out.

[approximately thirty seconds pass, long enough for all of us to sit down again]

Dog: Hey, could you please open the door for me?

[rinse, lather, repeat]

I'm beginning to wonder if somebody's switched our dog for a cat...


lederhosen: (Default)

April 2017

2345 678

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 25th, 2017 10:39 am
Powered by Dreamwidth Studios