Friday, November 12, 2010

The Top Secret Economic Boom of Generation Z


March 30 1999, the Dow Industrial Average passed 10000 for the first time.   Late August/Early September 2010 the Dow Industrial Average passed 10000 half a dozen times.  The Nasdaq in the same period has been cut roughly in half.  The recent Census finds that 1 in 7 Americans, the most since the 1960s, are now living in poverty.  Not just dull manufacturing jobs have been shipped overseas, but a hoard of IT and Customer Service too.  

My generation shouldn't have a rosy view of the economy.  We came of employment age when the dot-com bubble burst and graduated from college in the midst of the sub-prime-mortgage-turned-credit-turned-financial-turned-economic crisis.  As Napster morphed to Soulseek to Limewire to iTunes and bitTorrent we all realized we were the first generation since the era of Ragtime that would not have "rock" stars, just professional musicians barely living off of the long-tail economy.  Waiters, some nurses, doorman, stock brokers, human resources departments and a whole host of other professional and not-so-professional positions will be held by robots within 12 years.  Many of us have five or six digits worth of student debt.  Most of us live with our parents.  Most of our parents only significant savings is in the form of home-ownership and hordes of them are getting foreclosed on.   Those that haven't seen foreclosure have seen shriveled net worth and credit limits.  The list of negative Dust-Bowl-sounding gloom and doom items is endless.

And yet we feel like millionaires.  All of us.  As long as we can reach the critical salary mass to afford a fast computer, broadband, a mobile computing device and an unlimited data plan.  Why?  Because all we ever wanted was information and now information is free.  Don't get me wrong, we still have a couple of stragglers whose greatest wishes include SUVs and fine wines and other things that are impossible (at least today) to transmit via 0s and 1s.  Pound for pound though, we are a people whose interest is media and increasingly so.  

Remember the late 1990s?  Remember the money you spent on media?  I had a friend (a high school student working in Pizza Hut, mind you) who bought about $10,000 worth of CDs in 1998 from ColombiaHouse.  I remember planning on taking two summer jobs, one day and one night with about a half-hour of leisure budgeted daily when I was in 9th grade with the sole purpose of buying the ultimate collection of DVDs.   A hot CD in 1999, like Juvenile's 400 Degreez or that Santana record that got played to death (think Rob "Matchbox 20" Thomas and that Maria, Maria song) sold for about 18.99 or more at your local mall.   The graver robbery was the singles, 5 to 7.99 to get the hit, the remix of the hit and the probable next single (provided the test groups give it the go).  Movies, forget about it!  I lost my cousin's VHS copy of "Jerry Maguire" in 1997 and my dad made me pay over half my life savings, 22.97 to replace it.  Renting a movie from Blockbuster for 3 days nearly cost $6.  

Flash-forward to 2010.  I've got stations like Pandora and LastFM that generate music based algorithms to please my most particular favorite traits, be the variables mood, chronological, genre, tempo or instrumentation.  I even start BlackHat Pandora accounts with the sole purpose of bending the station to playing the songs I want.  Via a few well placed BookMarks and ThumbsUp I have a station that plays just The Strokes three records and assorted singles, about 57.97 worth of value in 1998 dollars.  File-sharing is inconceivably various , so I can peer-to-peer with best friends, go to sites, swap Flash drives, share over a common wi-fi or even just bump smartphones. High-speed internet means pirates can get their hands on whatever movies they want and get them fast.  It also means those people in the street who used to sell $10 copies of crummy hand-held-captured films now sell $5 high-def BlueRay discs and carry portable players so you can check for Quality Assurance before purchase.  A $20 Netflix subscription lets you stream endless content in 30-odd languages and order two physical discs at a time from a nearly universal catalogue.  

If the quantitative facts about changes in media are so significant, it's really shocking to sit back and feel the qualitative ones as well.  YouTube means anyone can put up content anytime, anywhere.  File-sharing means artists like Mos Def and The Strokes can be some of the defining artists of early 2000s without ever really getting heard on the radio.  The Bed Intruder phenomenon was an incredible rebuff to the record industry.  The greatest hit of the summer of 2010 was a clip from the news auto-tuned by some brothers and ensuing ocean of multi-genre covers of the auto-tuned news clip.  The funniest things of our generation aren't sitcoms or standup or variety shows but YouTube classics like Greatest Cry Ever and David After Dentist.  

As monumental as all that is, its really just information we used to store in physical objects like tapes and CDs made digital.  More interestingly, now even processes are becoming digital or at least partially so, and therefore depreciating as well.  Dating, for instance, remains a process in which it is virtually impossible to avoid all costs.  Yet hope is not lost.  Dating Services, with their broad connectivity and matching algorithms, serve to eliminate those frequent dating experiments with no ROI.  Facebook, 4square, Google Latitude, Yelp's Monocle and other location-based services help create the kind of coordination and environmental knowledge that makes dining, partying or getting a haircut much more successful endeavors. 

My personal favorite is the OpenCourseWare.  Since I was about 11 I've seen the tragic look on people's faces when they find out they can't attend a particular university.  I saw a kid cry when his mom told him he couldn't go to MIT and I saw a coworker at a Fortune 10 banking institution crushed when he received word neither of the Ivy-League MBA's he'd applied to had accepted him.  Those who do settle for universities on their budget constraint line while seriously yearning for a superior system can develop misanthropic habits and miss out on majors ill-explained to them.  At best, they waste a lot of money just for a diploma and sign a contract for a much higher lifetime opportunity cost of networking.  I don't think I even have to talk about high schools.  
OpenCourseWare amends so much of this. August 24th, CNET reported iTunes U's 300 millionth download.  That's about as many university lectures downloaded as there are American citizens. I've personally felt the power of this. I majored in Spanish Linguistics and Economics in college, thinking  that knowing Linguistics and Economics while also speaking Spanish was the best I could do with my life, my budget, my time constraints and those of college programs.  I went to City University of New York, A) because it's the most esteemed public university in America (Gordon Gekko, Andy Grove and Colin Powell went there) and B) because in most other states people (like employers) get the weird impression that you went to NYU when you say City University of New York and it's not lying if your response is always, "NYU is a great school".  

Needless to say, I felt cheated.  I felt abused by market forces.  I felt like I got felt up by Adam's Smith's invisible hand.  I wanted to be a Computer Science Major at Stanford.  I wanted to study Philosophy at Harvard.  I wanted to study Linguistics at MIT.  And I wanted to do them all at once.  By working and not taking a loan I didn't even have a schedule that would permit me to study Engineering at CUNY.  I was jealous of the rich and my envy hurt me.  I kept my bitterness inside as I noticed quickly that bringing up class's correlation with success is not appreciated in conversation, especially amongst the successful.  I soon learned I wasn't alone though.  I heard my suspicions echoed in offices and bars and publics places of all kinds.  I realized most people were dissatisfied with their college education and the few that had nothing to complain of their school generally filled that void in conversation by ruminating on their tremendous debt.  

Then one day Google saved my educational life.  I was watching YouTube, cruising through some programming tutorials, a recent discovery of mine, jumping from C++ to Python, and Primitive Operators to Lists.   Great short tutorials that taught me about the syntax of computer programming languages but not great sources for contemplating the spectacularity of their possibilities.  See a three-minute video on an implementation of a simple list and realizing how to type it correctly such that you get the data structure is nothing like the truly incredible consequences lists can have on your life.  What I really needed was a conceptual tour of the key ideas of software building.  One day I was prompted to a page of one of the true-content producers of YouTube, this guy named thenewboston, who has around 955 tutorials (as of 9/22/10) in multiple languages, most under ten minutes, perfect for median-intelligence people and great for above-intelligence folks new to programming who know how to fast-forward tactfully and realize when the guy is just mumbling incoherently.  Gazing momentarily rightward, in my YouTube Suggestions I caught sight of a blackboard, a man and the words MIT.  The mouse attacked for me:

I discovered MIT's Intro to Computer Science, one of the most ample offerings in OpenCourseWare.  This MIT course includes 24 lecture videos, 2 Complete Texts, Selections of Others, Assignments and Exams.  People are now even forming informal study groups so they can execute the class socially.  Although much OpenCourseWare is of this par of completeness just seeing the videos never hurts either.  The most charming, effective teacher I've ever known is Mehren Sahami doing Programming Methodology at Stanford whose videos I recommend to people of all ages, seriously entertaining by force of his tremendous personality to watch as though it were a sitcom about learning JAVA.  I've taken Computer Science, Mathematics and Economics Courses at Stanford, Harvard, MIT, CMU and UC Berkeley.  I sat through graduation speeches by both Steve "the Ninja" Jobs and Bill "the Borg" Gates.
The democratizing element of OpenCourseWare is a great social thing in the industrial world, but when it hits the third world things will never be the same again.  When broadband and tablets become cheap enough for distribution throughout the poorest of the nations (within 3-5 years) the endogenous growth functions attributable to the percentage of the population involved in STEM fields of those countries are going to explode.  Countries that never so much as had a research facility will suddenly be teaming with experts in Physics, Information Sciences, Mathematics and Engineering.  It is a note of national pride for Americans that we've innovated nearly everything (with a little help from Europe) over the past 200 years.  What most people don't realize is that losing our technological edge is not absolutely negative.  Imagine you're diagnosed with Lung Cancer….will you care that the cure was invented in China?  You lose your leg in accident…is it relevant enough to worry that it's bionic replacement is product of Indian Engineers?

The etymology of the OpenSource Movement (and a frequently-heard phrase of its exponents) is that it's "free as in free speech, not free as in free beer".  I beg to differ.  As automation progressively takes care of all our animal needs and bioinformatics our health, people will continue to create meaningful content without ever occurring to them that they should monetize it.  This utopian day may be very far away but as consumers that doesn't really matter to us, as long as people are willing to distribute content for ad revenue from businesses.  It's a beautiful thought that people someday would create content to inspire, inform and educate other people.  It's a beautiful life we live in today, in 2010, because our inspiration, information and education is largely being subsidized by businesses chasing super-specified ad revenue enabled by companies like Google and Facebook. 

I close with a crude-back-of-the-napkin-calculation.  In 1998-dollars I'm making hundreds of thousands of dollars just living.  Not everyone can be as lucky as me.  If some people had such a large stipend, they might spend it on other things like wine and women. I luckily, would spend it all on content anyways, content as data, content as information and content as knowledge.  The first two are nearly always free and the third is getting there.  On the other hand, I don't pirate much at all, just make due with manipulating the legal distribution channels so I'm sure there's some regular Joes, some store clerks and some Government Assisted-Living people out there pulling in as much as millions of yearly media-salary.  

I run through 6-8 films a week.  At 5.68 1998 Blockbuster dollars, that's $~2000 in income a year.  As I almost only watch documentaries and foreign films and would certainly have to resort to buying (and some cases spending significant time searching for) some, I apply a premium and multiply it times 1.25, to 2500 of 1998 Video Content Dollars.  Musical Albums I love.  With age I've grown and lost some of the ritualistic attachments American teenagers develop with pop music.  When programming, writing, studying or researching I like to find an album that is just pleasing enough to play from start to finish.  Since I do these four activities many of my waking hours, I'm consuming at least 4 albums a day (if the outlier super-music filled weekends are counted and the fact that music I consume mutually with friends was probably obtained freely too (unless by those saints who actually buy songs on iTunes)).   Some are certainly popular enough to cost the $18.99 in popular 1998 album dollars, some not-popular enough to cost less and yet some so unpopular in a physical object market like that of Compact Discs to cost a lot more so we'll make the price of a CD 17 in 1998 dollars. Assuming music was a simple as listening to albums from start to finish, I'd have had to have budgeted $24,820 strictly to the purchase of LP discs. Of course, when I feel like I need to be inspired and just look the song up on YouTube, when I leverage the fact that Android FroYo on my htc Evo 4G has Flash so I can go to any number of music-vendor sites that let you listen to whole songs for free without purchase, when I stream some artist's I love/hate's new single to sample, when I consider that Pandora is often turning me on to songs I'd never have discovered with 1998 radio I'm buying like 10 1998 singles a day.  That's another 23500.    The 2010 GDP per capita of the United States is about 46,400.  The mean-American earner would be stuck with about a $5000 deficit after buying as much 1998-priced media as I do.  I assumed he'd be hard-pressed having a good time and getting a good education.  I wasn't.  I don't keep accurate enough books to tell you how much money I've blown in bars and restaurants but I live in Manhattan, New York and I'm sure with 4Square, Facebook, Twitter, Google Maps, Yelp, a Smartphone and the assurance that everyone I know has one too, I'm saving myself at least 8 spend-a-lot-of-money-yet-have-no-fun-nights who in Manhattan often cost at least 100 before you realize you got to take that cab home (another 20 bucks), so right there you have it, I earn the entire GDP per-capita seated and then some in front of a screen in leisure activities. 
Saved Educational expenditures probably more than double my content-income as I doubt I have to remind remind  you that taking 12 classes at 5 Ivy League Schools this year probably dwarfs all the opportunity cost dollars I just laid out in the last paragraph...

Great Natural Language Processing Videos

Practical Applications of NLP

Speech Recognition and Retrieval

Incremental Bayesian Networks

The Next Generation in Neural Networks

Using Evolution to Design Artificial Inteligence

Bay Area NLTK User Group

Google Wave and NLP

Google Python Class: Regular Expressions

Machine Learning and Machine Translation

Machine Learning @ Stanford

Knowledge-Based Information Retrieval with Wikipedia

Saturday, October 30, 2010

Let's Outsource Reading

Dear Friends,

I love to read.   You love to read.  Why not?  A novel is great experience.  You live in it.  You feel it.  It becomes you.  You become its author or characters or mood.  The words in the book become the words that you come to define your life with and the very nature of your life changes because of the semantic meaning embedding in everything you see or do.

Sometimes though, we are forced to read against our will.  We are made to extract meaning from texts we have no intention of loving.  Worse yet, even if we love the books we are obligated to read, how are we realistically expected to emerge from it's reading with some objective idea of its plot, its language and its moral when its plot is endlessly assaulted by our subjective memories of life and media representing life, it's language assaulted by our idea of language and its moral assaulted by our generally-firmly-established moral compasses?

Any of us in high school or college know the pain of being forced to read novels we would never have read.  Worse still many teachers in the "liberal arts" don't grade based on execution but based on the size of the intersection of their ideology and the student's.  Now if you can cull a few key phrases from your teacher you can interpret a book according to their ideology without every having to abandon your own!

Those of us in the world of  non-matriculated adults wish we could feign a knowledge of books to impress coworkers, possible romantic mates, etc.   The opportunity cost of a book is roughly a week's salary, nearly $1,000 if you are the average American.  Would you like 40 hours and/or $1,000?

There are also those of you who love to read but for whom reading isn't enough.  You read a book five times and then you read all the existing criticism of it yet you still wish you could decipher the nexus of this book more.  Well my friend, I can help you apply the same exactitude Mechanical Engineers or Investment Bankers apply to their respective crafts with a cornucopium of tools you never imagined could be applied to literature.

A quick demonstration:  Let's say you had to read "A Tale of 2 Cities" by Charles Dickens.  Why you just paste the url of a website containing the book's text into my prebuilt function like so:

Please tell me the website containing the text you wish to analyze:
http://www.gutenberg.org/files/98/98-h/98-h.htm


Having pasted in the address of Project Gutenberg's free copy of the book the program outputs a series of facts with which one could easily create arguments for an essay with perhaps just the assistance of the book's description on the back.  There is no doubt that the aid of just the introduction or CliffNotes would be more than enough to proceed with this output to an incredible and insightful essay on the even the graduate level.

A few interesting characteristics:

The first output is that of collocations, or simply the most common two-word combinations of words in the book.  This is a great tool for figuring out main characters and combined with a little Google can really help you decipher character development in a cinch.  Also some of the non Proper Noun combos can be extremely telling of the plot.

The 100 most common two word sets in the book are: 
Building collocations list
Mr. Lorry; Miss Pross; Madame Defarge; Mr. Cruncher; Saint Antoine; Doctor Manette; said Mr.; Charles Darnay; young lady; Mr. Stryver; Mr. Lorry.; Miss Manette;Jacques Three; Sydney Carton; Young Jerry; Miss Pross.; Old Bailey; said Miss; Gutenberg-tm electronic;  n't know; one another; honest tradesman; electronic works;Archive Foundation; Mr. Darnay; Mr. Barsad; Monsieur Defarge; Temple Bar; golden hair; thousand seven; Mrs. Cruncher; seven hundred; Jarvis Lorry; Monsieur Gabelle; Mr. Carton; long time; Mr. Attorney-General; Good day; last night; Doctor Manette.; Madame Defarge.; electronic work; set forth; blue cap; Tellson 's.; North Tower

The limit to this function is your hunger for two-word combos.  I have set it to 100 for demonstration purposes.  You could set it to 1 or 1000.

 Let's move on to some other significant output:


Words used in similar contexts to monstrous are 


Building word-context index...
big blameless certainly complete early easily exactly familiar far
final freely good heavily here ignorant impracticable late litter
little long


Here you can toss a word in and see all the other words in the text that appear in a similar syntactic position of the sentence to that of your word and follow or are followed by the same or similar words.  This tool can be very useful for breaking down an author's style, or perhaps even analyzing the speeches of a political leader or CEO.  Applying it to Pride and Prejudice for example, we find the word 'monstrous'  for Jane Austen had the same sense as 'very' or 'extremely'.   Please note that 'monstrous' was a demonstration pick and you can use any word of your choice.


The lexical diversity score of the book is:  11  

The lexical diversity is the total number of words divided by the number of unique words in the text, the smaller the number the greater variety used by the author.  Nearly no books have scores under 8 or 9 (it would be hard to get a book out saying "the" just once so you can consider that range of "Slightly Below Ten" as extremely diverse.  Other great ways to apply it are to compare one author to others from their period or the book in question of the author with his other works.  Perhaps the "young" books of the author are filled with more lexical diversity as his/her primary concern was dazzling the world with his/her incredible and perplexing descriptions.  Perhaps the "old" books of the author exhibit low lexical diversity because now plot, character development and meaning have grown to take center stage in the work and now words are used like motifs in symphony, repeating throughout the text so as to create a definitive belief system without every explicitly stating it.  These claims are big, but with quantitatively sound lexical diversity measures you can at least dare to make them without your teacher responding with his "instant-write-off-ability" powers.

Now imagine you're reading "Great Expectations" but you saw your teacher at the video store once with an Obama shirt on and a copy of "Steal This Movie: The Life of Abbie Hoffman" in his hand, walking by his desk once you saw a thick tome of Karl Marx's "Kapital" and his hybrid auto has a Dennis Kucinich bumper sticker.  Let's be real.  Unless he is an incredibly detached and objective person - fairly rare in this world - writing a paper that interprets GreatEx from the perspective that Adam Smith's Invisible Hand and Reagan's Supply-Side Economics is not going to get you an A.

Instead you can develop a thesis dealing with Dickens' Marxist side manifesting itself through uses of the words "poverty" and "justice" throughout the work.  But how to know them all without reading it?  Boom:


The word injustice occurs in these contexts:
Building index...
Displaying 8 of 8 matches:
ock at my Lord Chief justice himself , and pulled
ge of the Lord Chief justice in the Court of King
s , the Tribunals of justice , and all society ( 
believe it. I do you justice ; I believe it. " Hi
 love of Heaven , of justice , of generosity , of
er of death , to his justice , honour , and good 
 love of Heaven , of justice , of generosity , of
 mind to impeach the justice of the Republic. She
None 


the word poverty occurs in these contexts: 
Displaying 7 of 7 matches:
ed the air , even if poverty and deprivation had 
abandoned her in her poverty for evermore , with 
ave repented them in poverty and obscurity often 
ations as their bare poverty yielded , from their
 in their children , poverty , nakedness , hunger
ss of twenty , whose poverty and obscurity could 
n the south country. poverty parted us , and she 




Naturally you can modify the amount of the words surrounding your search-word, asking for just a handful of them to get a few stylistic details or thousands at a time to get the whole page of excerpt featuring the search-word in question.

What about all those sources you need to quote?  Building a critical background for your essay is as easy as applying to the same search method to the Journal Articles you've thrown in your bibliography.  Do a quick run-through on "poverty"/"justice" in a handful of articles and you'll have the framework of an critical literary essay worthy of Harper's without ever having read the book nor the criticism you're quoting!

In the starter kit program I have provided these few key operations you can perform on a text.  I know hundreds more. Their output is not simply in the form of sentences and word lists but even graphs and charts or useful functions like search.  You can plot the fifty most used words in a text, do pie charts of word ratios to each other, or comb through thousands of pages with an algorithm much faster than the 'command f' you've been living by.

This program with the output above (legitimately enough for an essay writing) is on sale for only eight dollars.  I can also give you hour training/support sessions for 50$.  We can do as many such sessions are you desire to learn or my knowledge on Natural Language Processing provides.    Or you can just pick some of the aforementioned other methods and I will add them to the code for you just 4 dollars each (price negotiable as number of other methods passes 10).  Your choice.  

We let computers free of us childish basic arithmetic, algebra, geometry and statistics years ago and we  proceeded to an incredible degree of understanding of the maths and sciences.  Who would really believe then that we can't eliminate some of the more shallow reading functions and come away with more profound knowledge of texts?

Tuesday, September 21, 2010

Death to Poets

Having loved writing indecipherable, surrealistic poems as a youth and remembering how little effort was really needed in their composition, I decided recently to write a poetry virus of sorts that would automatically generate such texts.

The Algorithm is more or less this:

1) Generate Text (This is the easiest.  Using the .generate() method of the Natural Language Toolkit  for Python you can generate texts of any desired length.  They are n-gram generated, so that words will be followed by words based on the percentage that they follow them in real life.  Hence, if "whale" follows "the" 30% of the time in Moby Dick, "whale" ought be generated after 30% of "the"s in your generation.  NLTK comes with a series of readily manipulable parsed and tagged texts but you can go off and parse/tag one of your own if you think this exercise would be more exciting with a Kerouac or Phillip K. Dick novel instead of one of the 8 or 9 they supply you with.)  

2) Cut into Lines
    A) Create Monte Carlo number generator n<=14
    B) Cut to n words

3) White Spaces
    A) Create Monte Carlos number generator n < 6
    B) Assign n tab distribution across screen

4) Save to Word

5) E-mail to Poetry Publishing Sites


I was sincerely ready to create the whole system and become the most-published poet in history in a couple of days but I thought in doing so I was creating a huge disservice to people like Rimbaud and Whitman et al.  

At any rate, here's one of my Cyber-Poems, "Ahab"

 Moby Dick was 
                           fairly sighted 
from the hills .-- But the truth , the
very buttons of his bodily woes , 
           but by some experienced whaleman .
                                                                            After many similar hair - breadth escapes , -we ' ll be ready for it in two!- 
                                             -- that brawny doer of rejoicing good deeds  was wholly ignorant
of the horizon , like a wild set of mariners. 

But how fair ?
Fair for death ; 
how he lords it over the world!     
                       , so that when several ships are but subtile deceits , 

not only to be no hearts above the
                                                  common sperm whale ' s turn .

 the Leviathan that crooked serpent ; and nailed to her
                                                            highness a prodigious sensation in all his persecutions ; bethinking
it -- now over the head ,           
                                     and therefore
to ye , ye mates , seeking repose within six inches of his Captain
                                                                                                to mind the regular features 
of his Dutch whale fleet to be
                                           susceptible to atmospheric distension and contraction . If ye see ,
that thinking after all was caused by an awful question . 

................
It's certainly not a good poem, I realize that.  What makes me proud though is that I'd throw it in about 85th percentile of modern surrealist poems in terms of quality.  So, I beg you to ask yourself, if surrealist poems are generally terrible anyways, instead of trying to improve them, why not just automate the process of their composition?  They will continue to be horrible, only now humans can spend more time writing prose, studying engineering and uploading YouTube videos.....

Sunday, September 19, 2010

Augmented Reality @ Google Zeitgeist

Augmented Reality Plus a Couple of Other Great 7-10 Minute Speeches from the Geniuses over at Google Zeitgeist.


Watch minutes 3:40 to 19:59 to hear Maarten Lens-Fitzgerald of Layar.  Also immediately download Layar if you have an Android and/or >=iPhone 3GS.


Andreas Dengel on Text 2.0 from 19.60 to 26.18 is a delight as well.