An unexpected risk of using ReCaptcha

We are all familiar with CAPTCHAs which are used by web sites to make sure that a user is human (and to hopefully hinder spammers).

Sometimes these images are computer generated, but often the images are from a service offered by Google called ReCAPTCHA which sources images from old books. This service can supply websites with images of words that optical character recognition software has been unable to read. These websites then present the images for humans to decipher as CAPTCHA words as part of their normal validation process. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects. This sounds like a noble cause.

The old books that Google uses are unlikely to contain any offensive language - however the above graphic shows an interesting potential for confusion. This is an actual reCAPTCHA that I found in a Twitter post where someone berated a web site for having an offensive CAPTCHA. To most modern English speakers the first word looks like "goatfucker" which certainly wouldn't be a word I'd use in polite company.

As I noted above, however, the source of these words are old books... why would an old book have such an offensive word in it? 

In fact, the fifth letter of the word is a Long S (ſ) not a lower-case F (f). This means that the word is "goatſucker" which today would be written as "goatsucker" - which is a medium-sized nocturnal bird with long wings, short legs and very short bills. A photo of this innocuous bird appears to the right.

The Long S, according to Wikipedia, stopped being used in printed English by about the 1820s. Those of us who have seen old documents in History class at school or in Museums might recognise it from such famous documents as the U.S. Declaration of Independence and the Magna Carta.

There are probably other potentially confusing words like suckerfish.

The big question in my mind, though, is what is ReCaptcha expecting us to type? An 'f' or an 's' since I assume they don't want us to have to figure out how to put an 'ſ' in there.

========================================================

Epilogue: A few days after writing this blog entry, I got the following ReCAPTCHA on another site (See image on left). I tried typing in "some" for the first word and it worked. I wasn't able to test out what would happen if I had put in "fome" instead. 

1 comment: