Saturday, October 30, 2010

Let's Outsource Reading

Dear Friends,

I love to read.   You love to read.  Why not?  A novel is great experience.  You live in it.  You feel it.  It becomes you.  You become its author or characters or mood.  The words in the book become the words that you come to define your life with and the very nature of your life changes because of the semantic meaning embedding in everything you see or do.

Sometimes though, we are forced to read against our will.  We are made to extract meaning from texts we have no intention of loving.  Worse yet, even if we love the books we are obligated to read, how are we realistically expected to emerge from it's reading with some objective idea of its plot, its language and its moral when its plot is endlessly assaulted by our subjective memories of life and media representing life, it's language assaulted by our idea of language and its moral assaulted by our generally-firmly-established moral compasses?

Any of us in high school or college know the pain of being forced to read novels we would never have read.  Worse still many teachers in the "liberal arts" don't grade based on execution but based on the size of the intersection of their ideology and the student's.  Now if you can cull a few key phrases from your teacher you can interpret a book according to their ideology without every having to abandon your own!

Those of us in the world of  non-matriculated adults wish we could feign a knowledge of books to impress coworkers, possible romantic mates, etc.   The opportunity cost of a book is roughly a week's salary, nearly $1,000 if you are the average American.  Would you like 40 hours and/or $1,000?

There are also those of you who love to read but for whom reading isn't enough.  You read a book five times and then you read all the existing criticism of it yet you still wish you could decipher the nexus of this book more.  Well my friend, I can help you apply the same exactitude Mechanical Engineers or Investment Bankers apply to their respective crafts with a cornucopium of tools you never imagined could be applied to literature.

A quick demonstration:  Let's say you had to read "A Tale of 2 Cities" by Charles Dickens.  Why you just paste the url of a website containing the book's text into my prebuilt function like so:

Please tell me the website containing the text you wish to analyze:
http://www.gutenberg.org/files/98/98-h/98-h.htm


Having pasted in the address of Project Gutenberg's free copy of the book the program outputs a series of facts with which one could easily create arguments for an essay with perhaps just the assistance of the book's description on the back.  There is no doubt that the aid of just the introduction or CliffNotes would be more than enough to proceed with this output to an incredible and insightful essay on the even the graduate level.

A few interesting characteristics:

The first output is that of collocations, or simply the most common two-word combinations of words in the book.  This is a great tool for figuring out main characters and combined with a little Google can really help you decipher character development in a cinch.  Also some of the non Proper Noun combos can be extremely telling of the plot.

The 100 most common two word sets in the book are: 
Building collocations list
Mr. Lorry; Miss Pross; Madame Defarge; Mr. Cruncher; Saint Antoine; Doctor Manette; said Mr.; Charles Darnay; young lady; Mr. Stryver; Mr. Lorry.; Miss Manette;Jacques Three; Sydney Carton; Young Jerry; Miss Pross.; Old Bailey; said Miss; Gutenberg-tm electronic;  n't know; one another; honest tradesman; electronic works;Archive Foundation; Mr. Darnay; Mr. Barsad; Monsieur Defarge; Temple Bar; golden hair; thousand seven; Mrs. Cruncher; seven hundred; Jarvis Lorry; Monsieur Gabelle; Mr. Carton; long time; Mr. Attorney-General; Good day; last night; Doctor Manette.; Madame Defarge.; electronic work; set forth; blue cap; Tellson 's.; North Tower

The limit to this function is your hunger for two-word combos.  I have set it to 100 for demonstration purposes.  You could set it to 1 or 1000.

 Let's move on to some other significant output:


Words used in similar contexts to monstrous are 


Building word-context index...
big blameless certainly complete early easily exactly familiar far
final freely good heavily here ignorant impracticable late litter
little long


Here you can toss a word in and see all the other words in the text that appear in a similar syntactic position of the sentence to that of your word and follow or are followed by the same or similar words.  This tool can be very useful for breaking down an author's style, or perhaps even analyzing the speeches of a political leader or CEO.  Applying it to Pride and Prejudice for example, we find the word 'monstrous'  for Jane Austen had the same sense as 'very' or 'extremely'.   Please note that 'monstrous' was a demonstration pick and you can use any word of your choice.


The lexical diversity score of the book is:  11  

The lexical diversity is the total number of words divided by the number of unique words in the text, the smaller the number the greater variety used by the author.  Nearly no books have scores under 8 or 9 (it would be hard to get a book out saying "the" just once so you can consider that range of "Slightly Below Ten" as extremely diverse.  Other great ways to apply it are to compare one author to others from their period or the book in question of the author with his other works.  Perhaps the "young" books of the author are filled with more lexical diversity as his/her primary concern was dazzling the world with his/her incredible and perplexing descriptions.  Perhaps the "old" books of the author exhibit low lexical diversity because now plot, character development and meaning have grown to take center stage in the work and now words are used like motifs in symphony, repeating throughout the text so as to create a definitive belief system without every explicitly stating it.  These claims are big, but with quantitatively sound lexical diversity measures you can at least dare to make them without your teacher responding with his "instant-write-off-ability" powers.

Now imagine you're reading "Great Expectations" but you saw your teacher at the video store once with an Obama shirt on and a copy of "Steal This Movie: The Life of Abbie Hoffman" in his hand, walking by his desk once you saw a thick tome of Karl Marx's "Kapital" and his hybrid auto has a Dennis Kucinich bumper sticker.  Let's be real.  Unless he is an incredibly detached and objective person - fairly rare in this world - writing a paper that interprets GreatEx from the perspective that Adam Smith's Invisible Hand and Reagan's Supply-Side Economics is not going to get you an A.

Instead you can develop a thesis dealing with Dickens' Marxist side manifesting itself through uses of the words "poverty" and "justice" throughout the work.  But how to know them all without reading it?  Boom:


The word injustice occurs in these contexts:
Building index...
Displaying 8 of 8 matches:
ock at my Lord Chief justice himself , and pulled
ge of the Lord Chief justice in the Court of King
s , the Tribunals of justice , and all society ( 
believe it. I do you justice ; I believe it. " Hi
 love of Heaven , of justice , of generosity , of
er of death , to his justice , honour , and good 
 love of Heaven , of justice , of generosity , of
 mind to impeach the justice of the Republic. She
None 


the word poverty occurs in these contexts: 
Displaying 7 of 7 matches:
ed the air , even if poverty and deprivation had 
abandoned her in her poverty for evermore , with 
ave repented them in poverty and obscurity often 
ations as their bare poverty yielded , from their
 in their children , poverty , nakedness , hunger
ss of twenty , whose poverty and obscurity could 
n the south country. poverty parted us , and she 




Naturally you can modify the amount of the words surrounding your search-word, asking for just a handful of them to get a few stylistic details or thousands at a time to get the whole page of excerpt featuring the search-word in question.

What about all those sources you need to quote?  Building a critical background for your essay is as easy as applying to the same search method to the Journal Articles you've thrown in your bibliography.  Do a quick run-through on "poverty"/"justice" in a handful of articles and you'll have the framework of an critical literary essay worthy of Harper's without ever having read the book nor the criticism you're quoting!

In the starter kit program I have provided these few key operations you can perform on a text.  I know hundreds more. Their output is not simply in the form of sentences and word lists but even graphs and charts or useful functions like search.  You can plot the fifty most used words in a text, do pie charts of word ratios to each other, or comb through thousands of pages with an algorithm much faster than the 'command f' you've been living by.

This program with the output above (legitimately enough for an essay writing) is on sale for only eight dollars.  I can also give you hour training/support sessions for 50$.  We can do as many such sessions are you desire to learn or my knowledge on Natural Language Processing provides.    Or you can just pick some of the aforementioned other methods and I will add them to the code for you just 4 dollars each (price negotiable as number of other methods passes 10).  Your choice.  

We let computers free of us childish basic arithmetic, algebra, geometry and statistics years ago and we  proceeded to an incredible degree of understanding of the maths and sciences.  Who would really believe then that we can't eliminate some of the more shallow reading functions and come away with more profound knowledge of texts?