lederhosen: (Default)
[personal profile] lederhosen
Prompted by a now-deleted post on a snark* community complaining that 'grok' is a fictional term and is not proper language, regardless of whether it's in the OED...

Since everybody else seems to have strongly-held views on the subject of New Words, I thought I'd have a go at coming up with a compromise that will annoy everyone equally. And just to make sure of that, I'll start by calling on mathematics - specifically, information theory - to dictate a few rules of language. The math-phobic can skip the cut; it's just there to justify some of the principles presented after it.



Information theory is the branch of mathematics related to quantifying and conveying information accurately and efficiently.

To understand the underlying principles of information theory - indeed, to grok it - imagine you're commanding a naval fleet back in the days of semaphore: You need to be able to communicate with the other ships in your fleet, and the only way you have to do this is by waving coloured flags at them. What does this involve?

1. You need to be able to communicate urgent concepts quickly. If it takes ten minutes to say "The Frenchies are coming up behind us, have your men ready for battle!", you can kiss your command goodbye.

2. You need to be able to communicate *common* concepts quickly. If it takes ten minutes to say "Everything is in order", it's not likely to cause any immediate disasters, but your semaphorist's arms will get very tired, and that's ten minutes when he could've been doing something useful like scrubbing the decks.

2.1 You need to be able to communicate urgent-and-common concepts *very* quickly. From here on, I'm just going to lump urgency and frequency together as 'importance', noting that this doesn't quite match the standard meaning of that word - some things that are important in one sense or another may not be important from a language-construction angle.

3. You need to be able to communicate even 'unimportant' concepts - "Happy birthday Captain Hornblower, also do you happen to have any spare lime juice? Ours has leaked." But speed isn't as important here. (This is, BTW, an example of where my usage of 'important' doesn't match the common one - prevention of scurvy is very important by the regular meaning, but it's not urgent.)

4. You need to be able to communicate unambiguously. Confusing "Hard a-port!" with "Hard a-starboard!" is not a good thing.

IRL, semaphorists use different positions and motions as well as different colours & patterns to convey meaning, but let's simplify things by just considering colour, and suppose each flag-wave takes one second.

Obviously, the more colours you have, the faster you can send messages. If you have just two flags, black and white, you can send at most 2046 different messages in ten seconds. Some of them don't take the full ten seconds either - there are two single-flag messages that can be sent in just one second, four two-flag messages, and so on. (This is why it's not just 210 = 1024 messages. In practice, you might want some sort of 'end of message' code, or a separate flag reserved solely for that purpose, but I'll ignore that issue for now.)

If you add just one more flag, say red, you can send 88572 different messages in that same time. (Unfortunately, most of these extra messages are at the long end, so this isn't quite as good as an 88-fold improvement.)

By adding more and more flags, you can get faster and faster communications. But this will only take you so far - the more flags you add, the easier it is to mix them up and garble the message. (See requirement 4.) Once you've got as many flags as you can support without an intolerable amount of confusion, you need to look at ways to make the best use of the flags you've got. For now, let's just stick with red, black, and white.

At this point, a pure information theorist would take all possible flag combinations, rank them in order of length, and allocate them each a corresponding message, starting with the shortest combinations for the most important. "Hard a'port!", "Hard a'starboard!", and "Fire!" are all very urgent and quite common, so we give them each a one-flag code: red for 'fire', black for 'port', white for 'starboard'.

At the other end of the scale, we decide that there are precisely 88571 messages more imoprtant than "Happy birthday Captain Hornblower, also do you happen to have any spare lime juice? Ours has leaked." So we assign this the last of the ten-flag codes: WWWWWWWWWW. Meanwhile, there are eight Captain Sparrows in the same fleet, so "Happy birthday Captain Sparrow, also do you happen to have any spare lime juice? Ours has leaked" is the 11071st most important message in the queue; this qualifies it for a nine-flag code, which works out at RRBWRRBWR. (Probably better if you *don't* check my base-3 calculation there, since it's probably buggy.)

In real life, though, there are two problems with this approach. One is that it leaves no margin for error. Because every code corresponds to a message, a single mistake can lead to disaster - if my semaphorist has had a bad day and sends RRBWBRBWR instead of RRBWRRBWR, he might be telling the other captains "We must flee, throw all your ammunition overboard to lighten your load, and Mr. Turner is a nancing elf-boy." And two months later everybody dies of scurvy.

The other (somewhat related) problem is that while it allows messages to be very efficiently *carried*, it's cumbersome at both ends. Each ship has to carry around a code-book with nearly a hundred thousand different sequences; presumably your semaphorists will soon learn the most important ones - which are also the shortest - but often they'll have to look through the book before they can figure out what you're saying. Note that WWWWWWWWW and RRBWRRBWR have a very similar message, but don't look anything alike; this system would be great for reliable computers with plenty of storage space working over a slow connection, but it just isn't human-friendly.

As long as every possible code stands for a legitimate message, the first problem is inevitable; you can only eliminate it by making your code less compact, and not using every possible combination, so garbled messages can be identified as illegitimate. (You could get the recipient to resend the message to compare against the original - but that at least doubles the time involved in the communication, before you get started on figuring out whether it was the original message or the return message that got garbled.)

A standard solution to the second one is to break messages up into smaller concepts - for instance, RWRRB might always stand for 'lime juice'. This makes it more human-friendly, and less compact, which goes a long way to addressing the first problem; it has the added benefit that it makes it easier to repair a garbled message by context. (Above, you might have noticed that I typoed 'important' as 'imoprtant'; that was a genuine mistake, but I left it in because it illustrates this point nicely.)

There's a caution there for those looking to make the English language too efficient - many of the existing 'inefficiencies' and 'wastage' in the language actually serve as a form of error-detection and correction.



[livejournal.com profile] lederhosen's principles of vocabulary. Note that some of these conflict, and have to be weighed against one another, except for rule 9 which is non-negotiable.

1. Important (in particular, common) concepts should have compact expressions. The more it's used, the shorter it ought to be.

2. As our world and context changes, so does the importance of various concepts. It's no longer as important to be able to say "bear" in a hurry as it used to be; "computer", OTOH, has become ubiquitous.

3. To keep language effective, it needs to be able to change to reflect these facts. Where there's a need for a new word, or a shortening of an existing word, we should be willing to accept such novelties.

4. If a lot of people adopt a neologism, this is evidence that such a word was needed, and can be taken as grounds for its acceptance.

5. Exception to #4: if there's already a perfectly good & compact word for this purpose, use that one instead. Neologisms should be created due to need, not ignorance and laziness.

6. Exception to #4: Where possible, neologisms should be user-friendly. As far as possible, this means following existing patterns of language. Adapting existing English is great; borrowing from other languages is good. Words derived from Latin etc. are more likely to be readily understood and accepted than words made up from scratch.

7. Exception to #6: Sometimes, insistence on following existing patterns may get in the way of #1 and #3. Latin constructions tend to become fairly long; as such, they're admirably suited for necessary but uncommon pieces of vocabulary - for instance, many academic terms - but less so for things like "blog".

8. User-friendliness also means avoiding ambiguity. English already has more than enough homophones, thank you very much.

9. Numbers are not letters and should not be used phonetically, EVER, with a possible exception for Sinead O'Connor when covering Prince.

I quite like 'grok' because it satisfies almost all of the above principles. It offers a compact and unambiguous word for an important nuance that isn't adequately conveyed by any other short form - 'understand' and 'comprehend' are longer, and as with 'know' they lack the connotations of fully absorbing and coming to terms with the concept. (Indeed, the fact that it's hard to explain 'grok' except by example is a proof that the niche exists.) The only one it doesn't satisfy is relationship to pre-existing language, and I think the others greatly outweigh this.

*Carrollites will no doubt appreciate the irony.

Date: 2005-04-10 05:03 am (UTC)
From: [identity profile] lokicarbis.livejournal.com
Just curious - by 'annoy all equally' did you in fact mean 'not at all'? I found your reasoning solid and agreeable.

But then, I like grok too...

Date: 2005-04-10 07:44 am (UTC)
From: [identity profile] lokicarbis.livejournal.com
Now, it it had been one of the more awkward neologisms, like 'sumbanall'...

Date: 2005-04-10 08:30 am (UTC)
From: [identity profile] lederhosen.livejournal.com
Tongue half in cheek there. The other half is that just about all the opinions I hear expressed on this fall firmly in one of two extreme camps: "language is fluid, so there's no wrong way to use it - it means whatever I choose it to mean" and "language is sacrosanct, any changes are an abomination, except maybe Latin and Greek derivatives, which should always follow the rules of those languages". From past experience, I'm far enough from either of those two camps to annoy both of them.

I'm sure there *are* plenty of reasonable people in the middle, just that the extremists seem to be the only ones who say much about it.

Date: 2005-04-10 09:01 am (UTC)
From: [identity profile] lokicarbis.livejournal.com
I'm sure there *are* plenty of reasonable people in the middle, just that the extremists seem to be the only ones who say much about it.

I hear that.

Date: 2005-04-10 06:09 am (UTC)
From: [identity profile] notasquirrel.livejournal.com
wel mie gudniz. thetz extreemlee...

*headskritch*

thet iz toe sae, thetz veree, um... *pawwave*... u kno.

*flick*

i meen, ite jeste...wel...iz, riate?

*flick*

*flick*

*sigh*

o, y donte u jeste giv mee a kashewe ann weel kal ite gud?

Date: 2005-04-10 08:31 am (UTC)
From: [identity profile] lederhosen.livejournal.com
But does calling it 'good' make it a good cashew?

Only one way to find out, I suppose. Have a cashew.

Date: 2005-04-11 01:03 am (UTC)
From: [identity profile] notasquirrel.livejournal.com
But does calling it 'good' make it a good cashew?

{thoughtfultailswish} ime note shur, bute iffe i hadd a bigg enuf sampul grupe i kud tel u iffe ite wuz gud inn generul.


Only one way to find out, I suppose. Have a cashew.

{irritatedchitter} wel donte nock yorselfe oute.

Date: 2005-04-12 04:32 am (UTC)
From: [identity profile] notasquirrel.livejournal.com
How big is enough?

{happychitter} wel jeste bak upe thee truk ann weel fiande oute!

Date: 2005-04-12 04:38 am (UTC)
From: [identity profile] lederhosen.livejournal.com
*beep* *beep* *beep* *beep*

Date: 2005-04-12 08:21 pm (UTC)
From: [identity profile] notasquirrel.livejournal.com
*beep* *beep* *beep* *beep*

{ecstaticskwerllysqueal} yaay SIENCE!!

Date: 2005-04-10 06:27 am (UTC)
From: [identity profile] panacea1.livejournal.com
You've been watching PoTC again, haven't you?

Date: 2005-04-10 08:30 am (UTC)
From: [identity profile] lederhosen.livejournal.com
No, but I was stuck for examples so I reached for it ;-)

Date: 2005-04-10 03:53 pm (UTC)
From: [identity profile] djfiggy.livejournal.com
I think most of the York University linguistics department might fall in the "language is fluid" camp. Good to see a bit of exposition that doesn't make me feel like half my face was chewed off by a bunch of rhesus monkeys.

Date: 2005-04-11 12:43 am (UTC)
From: [identity profile] turnberryknkn.livejournal.com
(grin) That is very cool.

And I am reminded of James Nicoll's famed quote: The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary. ;-)

Date: 2005-04-11 04:51 am (UTC)
From: [identity profile] lederhosen.livejournal.com
I had [livejournal.com profile] james_nicoll's line in mind when I titled the post :-)

Date: 2005-04-11 03:04 am (UTC)
From: [identity profile] shadow-5tails.livejournal.com
You've been reading those groups again...

It may be easier and less painful to just go straight to the beating-your-head-on-the-desk stage, you know. Just for the record.

*grin*

Date: 2005-04-11 04:51 am (UTC)
From: [identity profile] lederhosen.livejournal.com
Well, I at least got a good snark out of it :-)

Date: 2005-07-06 06:46 am (UTC)
From: [identity profile] winterkoninkje.livejournal.com
As for points #1, #2, and #8: they capture one of--what I'd classify as--the prime conflicts in phonology: "ease" vs "fidelity". The idea is that we have some representation of a pronunciation for a given word in our heads and, on the one hand, we want to pronounce it in the easiest way we can (hence contractions, slurring, dropping out certain consonants, adding in certain vowels), but on the other hand, we want to pronounce it as closely to our mental representation as possible to minimize ambiguity and preserve distinctness. This is why over time languages evolve towards succinct particles (prepositions, auxiliary verbs,...) even to the point of glomming them onto the word they modify or dropping them entirely, and then redevelop words those particles used to encompass and start the whole thing over.

As for #4, #5, #6, and your defense of "grok": there's another theory in linguistics that stipulates that all words in a language must have distinct meanings (with meaning used very loosely here). So if we introduce a new word that fills the same general semantic space as others--whether a neologism, foreign import, or whatever--then that word will either perish because it isn't needed ("understand", "comprehend", and "know" are good enough) or similarly cause another to perish and replace it, or alternately its meaning--and the meaning of other words in the same semantic space--will shift until they reach a new balance with every word in its niche.

That said, I think "grok" and "snark" are perfectly acceptable words. According to my spellchecker, the former is a legitimate word but the latter isn't (much to Carollites' chagrin no doubt ;)

Profile

lederhosen: (Default)
lederhosen

July 2017

S M T W T F S
      1
2345678
9101112131415
16171819202122
2324252627 2829
3031     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 12th, 2026 12:59 am
Powered by Dreamwidth Studios