Monday, August 06, 2012

Contextual spelling

tl;dr simplicity can hide depth

Some people expect that, as they discover more, they get closer to finished. But exploration can instead take you further from where you started.

Got a Mac*? Try this...

I open a new TextEdit document. I type "contact sant". On my machine, the word "contact" is highlighted as a spelling problem. I note that if I change "sant" to "Sant" or "san", "contact" is no longer highlighted. Perhaps one gives more context, one less. If so, I've bracketed a sweet spot, which is good to know. I might come back to this.

Because I'm in Text Edit, I suspect I'm actually using Apple's Cocoa text. The same problem shows up in MacJournal. However, it does not show up in Mail or Evernote. I'll not follow this particular path of enquiry for now. Maybe someone else can inform me.

That's two paths ignored. What I want to do is to dig deeper into spelling. If I flip open the Edit:Spelling:"Spelling and Grammar" panel, the alternatives to "contact" offered are "kontakt" and "kontant". If I change "sant" to "santos", I'm offered** "contacta", "contacte", "contacto", "contacté" and "contactó".

I note that a drop-down box says "Automatic by Language", and to my mostly-monoglot eye, the first suggestions look more Germanic, the second more Latin. Though neither TextEdit nor MacJournal allows me to set the language of a text fragment, it's clear that the suggested spellings are from two different non-English dictionaries***, and that the choice of one excludes others.

Could it be that Cocoa text decides what language a text fragment is in before it goes off to get spelling suggestions? Is it really deciding from four letters?

Over to you.


* I'm still on Snow Leopard, v 10.6.8. You may not be. You'll know better than I.
** actually, nothing changes immediately. I need to close and re-open the "Spelling and Grammar" panel to see the new suggestions. Yeah, I'm not exploring that, either.
*** so that's three in total, smart-alec.


  1. Anonymous3:43 pm

    "Is it really deciding from four letters?"

    I'd say when those 4 letters spell a word found in a dicitionary, then it's a perfectly reasonable assessment.

    Look up "sant" in Swedish. "kontakt" is also a Swedish word so I don't really see an issue in how the system is handling the 'problem'.

    Also, a quick Google search turned up several solutions on how to restrict the language libraries.

  2. Hello Anonymous, thank you for your comment*. As it happens, I don't see an issue with it either - if I was able to express my tone of voice unambiguously, you'd hear the phrase "Is it really deciding from four letters?" with a tone of curiosity, not incredulity or frustration. I'm rather impressed that the underlying technology does this.

    For me, this is interesting primarily because it exposes (what appears to be) deeper and more complex functionality than I initially and ignorantly expected. To circle back to the sentiment at the top of this short post, an unexpected behaviour can open up new layers of functionality. If I'd been shown a spelling problem on "sant" (which I expected) rather than on "contact" (which I didn't expect), I would have remained ignorant.

    Thanks for the link to the interesting instructions to restrict languages available for spell checking. I won't be doing that. Call me a wuss if you like, but my typing's rotten and I need my spellchecking working reliably. I can deal with interesting false positives, but not with complete radflajhlp.