26 09/11
14:09

A Few Million Monkeys Randomly Recreate Shakespeare | Jesse Anderson

Jeff A’s Comment: I wonder how much this cost.

from www.jesse-anderson.com

Posted by on Sep 23, 2011 in Blog | 0 comments

A Few Million Monkeys Randomly Recreate Shakespeare

Friends, Romans, countrymen, lend me your ears;
I come to recreate Shakespeare, not to praise him.
- Monkey Julius Caesar

Today (2011-09-23) at 2:30 PST the monkeys successfully randomly recreated A Lover’s Complaint. This is the first time a work of Shakespeare has actually been randomly reproduced.  Furthermore, this is the largest work ever randomly reproduced. It is one small step for a monkey, one giant leap for virtual primates everywhere.

The monkeys will continue typing away until every work of Shakespeare is randomly created.  Until then, you can continue to view the monkeys’ progress on that page.  I am making the raw data available to anyone who wants it.  Please use the Contact page to ask for the URL. If you have a Hadoop cluster that I could run the monkeys project on, please contact me as well.

This project originally started on August 21, 2011.  Over the course of the project, over 5 trillion character groups have been randomly generated and checked out of the 5.5 trillion possible combinations.

So far, the project has appeared on SlashdotFox NewsEngadgetJapanese Engadget, and Solidot.  If you would like to do a story, please contact me via the Contact page.

The Inspiration

This project comes from one of my favorite Simpsons episodes which has a scene where Mr. Burns brings Homer to his mansion (YouTube Video). One of his rooms has a thousand monkeys at a thousand typewriters. One of the monkeys writes a slightly incorrect line from Charles Dickens “It was the best of times, it was blurst of times.”  The joke is a play on the theory that a million monkeys sitting at a million typewriters will eventually produce Shakespeare.  And that is what I did.  I created millions of monkeys on Amazon EC2 (then my home computer) and put them at virtual typewriters (aka Infinite Monkey Theorem).

Less Technical Explanation

Instead of having real monkeys typing on keyboards, I have virtual, computerized monkeys that output random gibberish. This is supposed to mimic a monkey randomly mashing the keys on a keyboard. The computer program I wrote compares that monkey’s gibberish to every work of Shakespeare to see if it actually matches a small portion of what Shakespeare wrote. If it does match, the portion of gibberish that matched Shakespeare is marked with green in the images below to show it was found by a monkey. The table below shows the exact number of characters and percentage the monkeys have found in Shakespeare. The parts of Shakespeare that have not been found are colored white. This process is repeated over and over until the monkeys have created every work of Shakespeare through random gibberish.

Technical Explanation

For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux.  Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys.  The Map Monkeys create random data in ASCII between a and z.  It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys.  Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test.  If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison.  If that passes, a genius monkey has written 9 characters of Shakespeare.  The source material is all of Shakespeare’s works as taken from Project Gutenberg.

The monkeys’ data from Amazon’s cloud is updated on this site every 30 minutes.  The images below show green for every character group that was found and white for those that are still missing.  The images output is kind of like the animations for defrag utilities.  As the monkeys progress through the works, more and more character groups will be found and show green.

This chart shows the total number of character groups as more and more iterations of the checks are run.

This chart shows percent complete as more and more iterations are run for each story.

For the curious, the computer I ran the monkeys on is a Core 2 Duo 2.66GHZ with 4 GB RAM running Ubuntu 10.10 64-bit.

A Few Words To Try and Prevent The Usual Comments

I realize there are different interpretations to this saying/theorem and I have done 2 different ones already.  I understand the definition of infinite and infinite monkey theorem and I realize that this project does not have infinite resources.  This project was funded and written by myself and was not supported by any grant money or federal money.  No monkeys were harmed during the making of this code.  This project is my attempt to find a creative way to attain an answer without infinite resources.  It is a fun side project.  If you still feel angry or slighted or feel the need to set me straight, please read this sign: