In November 1999, Slashdot, a website for technology related news (for nerds, according to their own description) released a poll asking users to vote for the best computer science college. The poll recorded the user's IP address, with the intention to prevent the same user voting more than once. However, Carnegie Mellon University found a way around this and started stuffing the ballots with thousands of votes. Only one day later, MIT responded to this by creating their own bot and it turned into a contest between programs. The end result was that MIT finished with 21,156 votes and Carnegie Mellon with 21,032, whilst every other school did not jump the 1000 mark.

The above is one of the more interesting cases where a bot was used to emulate a human's behaviour, however in most cases a bot is used to spam forums or comment sections on blogs, usually in an attempt to improve the search engine ranking of a site or product. The same system is used to register accounts or submit contact forms, making the website's administrators vulnerable to spam-filled mailboxes. According to Aksimet (an anti-spam software), over 18 million comments daily are actually spam.

A CAPTCHA, short for Completely Automated Public Turing test to tell Computers and Humans Apart; aims to solve these problems by creating a program that humans are able to pass with ease, whilst computers cannot. This may seem fairly simple, however it is becoming increasingly challenging to create something that humans can do that computers cannot. This has resulted in a wide array of diverse CAPTCHAs, some easier than others and others which are a real headache even for human users to solve.

Almost everyone on the internet must have come across atleast a couple of CAPTCHAs, the most common being the one where a user must recognize a word or pair of words and input them in a textbox. The inner workings of such a CAPTCHA allow that the correct answer (or checksum/hash from this answer) is stored on the server-side, whilst the user will input his answer in a textbox. Upon submission, the user's input is sent to the server and compared for correctness. Although this might have sounded pretty obvious, you would be surprised how many CAPTCHAs used to store the correct value on the client-side too, for example as the ALT tag on the image. Needless to say, that made the spam bot's life very easy.

 The reCAPTCHA project (by Google) is possibly the most used worldwide, with over 200 million daily uses; however it is estimated that it takes an average of 10 seconds to complete one, and according to Casey Henry, this could result in a possible conversion loss of around 3.2%. So ultimately one must ask the question, "is it worth losing out on more than 3% of conversions to save oneself's the hassle of filtering through a few spam e-mails?"

A study in 2010 evaluated the human performance in solving CAPTCHAs, and the results were not at all favourable. This is mainly because CAPTCHAs have become more and more complicated in an attempt to completely block out the increasingly clever bots. The text recognition reCAPTCHA uses scanned words from books which OCR (optical character recognition) technology has failed to interpret. This means that a bot would be unable to recognize the words within the image; however this means that the text could also be difficult to read for a human, and as OCR technology keeps improving, it will become increasingly challenging to create text that is human-readable yet not computer-readable.

reCAPTCHA

An interesting option, frequently used by Solve Media is to use logical questions, which should be straighforward to a human, but imposssible for a computer to understand. Questions could vary from mathematical formulae, to text manipulation. Such CAPTCHAs could include questions such as: 

  • box is to boxes as fox is to...
  • "PURINA Dog Chow", describe this brand with any word
  • Write the number 20 in words

Solve Media

It should be immediately apparent that whilst these are solveable, they do require a bit of thought and a user could not be bothered with trying to solve them. Apart from this, a non-English speaker could have difficulty understanding the question, possibly resulting in a loss of conversions on a multi-national web.

Another super-cool method for distinguishing between humans and computers is Facebook's friend recognition CAPTCHA. If anyone has logged onto Facebook from a different PC or location, it is likely that you have seen the screen where you are asked to name your friend from a couple of photos where he/she has been tagged. Apart from being almost 'fun' to complete, Facebook is using it's existent resources and data to create this CAPTCHA, so it is dependent that this data is correct and you haven't tagged a bottle of beer or a beautiful sunset as your friend.

And so we have concluded that....CAPTCHAs are not the solution. Tim Kadlec stated that "Spam is not the user's problem; it is the problem of the business that is providing the website. It is arrogant and lazy to try and push the problem onto a website's visitors." And here at Incredible Web we believe this to be true and try to follow it as much as possible by using a honeypot. As the name states a honeypot is a trap to catch spam bots by having them fall for the bait.

At Incredible Web we implement a honeypot in the form of a hidden input field on a form (such as on the contact form, go ahead, check the source HTML). A human user will open the contact form, the honeypot field will be hidden and therefore not input anything into it, whilst completing the remaining fields. On the contrary, a bot does not interpret CSS and will fill in all the textboxes, including the honeypot field. Once the form is submitted to the server-side, we check whether the honeypot field is populated; if it is, then it must have been a spam bot, otherwise it would be human. So far this technique is serving well, both on our website and the website's of our clients (we include it by default in all forms) and spam is reduced to almost nothing, whilst not impacting the human's user experience.

Let me know what you think, what CAPTCHAs do you favour or what was the most insane CAPTCHA you have ever seen?

Thanks for reading,

Kevin