4chan’s Captcha

Well, if you live on the internet like I do, than you probably know that M<3<3t apparently so fed up with the spam bots which had basically eaten his site alive, decided that it would be best to go ahead and implement a captcha. The service he chose is probably one of the best known groups on the planet and they’ve been doing it for quite a while now. That group of course was reCAPTCHA which took an idea and ran with it. The premise is simple, you want something that a human can read but that can’t be read by a bot. So what better source material than something that hasn’t been readable by some of the best Optical Character recognition software can’t figure out. This is done by uploading a couple thousand books to a central database by scanning them in, figuring out what the OCR can’t read and then letting a human user figure it out.

To verify accuracy, a known word is paired with an unknown word, the upside to this is that you get human verification and you create a knew known word if enough people all respond with the same answer. This also helps to slowly transcribe all the books in the universe from badly scanned text to some form of digital input. 4chan implemented this captcha system no less than three weeks ago [Late July, Early August], and to pretend that the massive influx of users trying to add post isn’t affecting the system is I think delusional at best.

Now, admittedly, for the most part the system is doing exactly what it was supposed to, keeping spam bots out, and only human posters in. This has dramatically decreased the amount of spam that is populating the site, but it’s also had another odd side effect. When the captcha was put in place, it was all rather simple stuff that was pretty easy to make out, 3 weeks later… well here I’ll just show you a picture:

reCAPTCHA

First one to tell me what the black blob is gets a shoutout in my next blog post

Now, there are still a few in there, as you can hit the nice refresh button, and after a while you will find something that is actually human readable. But if I were going to have to take a guess, I would say that reCAPTCHA is actually being affected by the hundreds of thousands of post [and thus request on their servers… over at google] that are made every single day across the multiple boards.

Let’s not forget either, that this is the same crowd that surprised reCAPTCHA before by completely bypassing it to game the Time’s Online top 100 pole. There is also the small fact that most of 4chan is now well aware that with enough work and persistence they could in theory trick the system into thinking that the unknown word will become whatever enough of 4chan agrees on. I for one can’t wait until all of those documents begin reading like post from /b/, and the library of congress is not only a cultural monument in digital form, but truly a landmark to behold for future generations.

That aside, I don’t think 4chan cares enough to actually game reCAPTCHA, but I would be interested in what that poor database is having to drag up now that they are getting so many more request made to their servers. I mean there is no pretending that reCAPTCHA doesn’t have big customers, and more importantly, sites that are large enough to actually dip into the word pool, but most large sites are using their own internal captcha system. Alexa claims it’s site 616 in terms of most visited on the web, but you’ll find me hard pressed to believe that it’s traffic numbers don’t beat viemo which sits happily somewhere in the 100’s globally. Well admittedly I don’t use viemo enough to appreciate just how much traffic they are doing these days, and I’m pretty out of touch with the average human being when it comes to what people actually look at while on the web [which if Alexa is to be trusted, is a LOT of porn].

But yes, I’m curious if 4chan will actually have any long-term impacts no Recaptcha, or if they’ll just start uploading a ton of new books to keep up. There have been a LOT of foreign characters showing up now and days, which could make things a bit difficult on most American users who couldn’t tell you the difference between an umlaut and kanji. I’m not saying that they haven’t shown up before, but their frequency has increased.

Just interesting.

Advertisements
This entry was posted in Internet and tagged , , , . Bookmark the permalink.

2 Responses to 4chan’s Captcha

  1. karim24 says:

    What a useless Twaddle.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s