Sentiment analysis has come a long way in the past few years. Gone are the days when systems would be fooled by a simple negation such as “I don’t love this movie.” With Deep Learning approaches, much more complex and subtle forms of positive or negative sentiment can be picked up on by the system. These new, more accurate approaches open up a variety of possibilities for understanding sentiment in large amounts of text. I decided to apply some of these new models to classic literary texts in order to see if I could visualize their affective valences, and give some insight into decades old works.
The models I used were a Convolutional Neural Network and Recurrent Neural Network, both pre-trained using word2vec. The code for both is available here. Both models were trained using a database of positive and negative movie reviews typically used in sentiment analysis research. Immediately this posed a theoretical problem, how much does sentiment in movie reviews have to do with sentiment in literature? Once I had the trained models, I needed a sanity-check. So I took the most positive and negative books I could think of, and compared them.
For the most positive, I choose Leaves of Grass, Walt Whitman’s classic ode to life, democracy, and the human potential. For the negative work, I chose The Metamorphosis, by Franz Kafka, a book about a man who turns into a bug, is hated by his entire family, and then dies. Just from these descriptions, the sentiment of the work is apparent to us humans. What do the neural networks think? Below are the visualized sentiments of the two books. Each block represents a single sentence, with blue indicating positive, and red indicating negative sentiment.
Lo-and behold, it worked pretty well! From there I gathered over a dozen other works of literature publicly available, and I put together a website where these visualizations can be viewed. The website allows you to explore both the visual overview of the entire work, but also highlight any given block and discover the corresponding sentence. While many of them are pretty accurate, it is a fun exercise to discover the ways in which the models break down. The site also provides both model’s predicted sentiments. The CNN is more accurate, but the RNN makes surprising classifications that can be fun to look through as well.
If you like the site, and want to see what one of your favorite public domain books on it, don’t hesitate to send me a message, and I would be happy to add it! It turns out many works seem to have on average a balanced amount of positive and negative sentiment, but I would love to use this tool to discover and visualize the extreme cases of the most positive and negative works of literature, whatever they may be.
I am a Phd student in Cognitive Neuroscience, currently looking for internship and work opportunities in the San Francisco Bay area this coming fall. If you are part of a company working on Deep Learning, Machine Learning, or Data Science, I would love to chat about joining your team!