Title: Big Data: A Revolution That Will Transform How We Live, Work, and Think
Author: Viktor Mayer-Schönberger and Kenneth Cukier
Publisher: Eamon Dolan/Houghton Mifflin Harcourt
Acquired: From the publisher for review consideration
Review: Everyone who has ever tried to comment on a website (or comment on several websites in a row) has, at some point come across Captcha (Completely Automated Public Turing Test to Tell Computers and Humans Apart) — those seemingly random strings of letters and numbers site administrators use to try and make commenters prove they’re human.
Although inventing Captcha brought 22-year-old Luis von Ahn plenty of fame and fortune (including a MacArthur “genius” grant worth $500,000), he eventually started to feel bad about all the time people were wasting typing in random letters each day. Von Ahn worked to modify the program and eventually came up with its successor: ReCaptcha. Now, instead of random letters, the program presented people with two words from text-scanning projects (think Google Books) that a computer’s character recognition program couldn’t understand:
One word is meant to confirm what other users have typed and thus is a signal that the person is a human; the other is a new word in need of disambiguation. To ensure accuracy, the system presents the same fuzzy word to an average of five different people to type in correctly before it trusts it’s right. The data had a primary use — to prove the user was human — but it also had a secondary purpose: to decipher unclear words in digitized texts.
If book digitizers had to hire people to figure out these words, it could cost more than $1 billion per day. Instead, ReCaptcha harnesses what users are already doing on more than 200,000 websites to solve big data problems.
While this story may not make you feel more excited about typing in random words to leave a comment on a blog, it is an example of one of the many fascinating anecdotes about the changing world of information in Big Data by Viktor Mayer-Schönberger and Kenneth Cukier. In the book, Mayer-Schönberge and Cukier explore the implications of living in a world where we have the power to collect, analyze, and act on more information that we’ve ever had available before.
I decided to read this book because I have a couple of friends who work and do research related to big data and I wanted to understand the phenomenon better. What I learned as I read the book is that big data isn’t as far removed from my life as I imagined it was. I provide information about my life and my habits voluntarily and involuntarily every day, and that data is part of a revolution of analyzing and correlating seemingly random factors to make decisions about what to do next.
For example: analysts at WalMart have discovered that during inclement weather, the sale of PopTarts increases. The Huffington Post and other news websites make decisions about what to cover based on what data tells them people want to read about. Target can predict when a woman gets pregnant and uses that information to provide coupons for products.
There’s a lot more to big data than that, which the book explores in great detail that I don’t want to try and repeat here because I won’t do it justice. This book is full of fascinating information about the challenges and opportunities of big data. The Captcha story above came from a chapter talking about a world of big data forces us to think more about the value of information, including the fact that information gathered for one purpose can have a secondary value when combined with other information.
Stylistically, the book was a little repetitive. As I read, it felt like each chapter circled back in on itself several times, repeating the same argument several different times within a single chapter. While I think the authors were trying to make sure their points were clear — this is one of the first books to try and explain big data to a popular (not scientific) audience — as a reader I felt the circles back were unnecessary because, overall, the prose was very clear and easy to understand in the first place.
Mayer-Schönberge and Cukier are also a little light on time spent looking at the potential negative implications of a world built on big data. The fact that there is more data collected and stored on us as individuals than ever before is, on several levels, pretty disturbing. Big data threatens our privacy, makes it possible to penalize people based on predictions, and makes it possible to over-rely on data-driven results than is effective. The book devotes a single chapter to these concerns, a ratio that felt skewed to me.
On the whole, however, Big Data is a deeply interesting book that give a clear overview about the risks and rewards of a world built on information. Although it’s not at all clear what the implications of this transformation will be, Big Data provides the perfect level of information for readers unfamiliar with the concept but hoping to understand more.
If you have reviewed this book, please leave a link to the review in the comments and I will add your review to the main post. All I ask is for you to do the same to mine — thanks!