Big data is like toilet paper. One-ply, two-ply, single roll, double roll, ultra roll – lots of choices and no humanly comprehensible way to make an honest comparison. Every package says it’s the biggest and the softest. So much confusion for such a mundane task. It’s a microcosm of our problem with big data, isn’t it? Every day, for even the smallest decisions, we’re overwhelmed with data. It doesn’t matter whether we’re talking about toilet paper or auto insurance; we are constantly bombarded with an overabundance of information and options.
How can we deal with this information overload? Perhaps we can learn from the history books; we’ve seen this phenomenon before. In fifteenth-century Germany, Gutenberg invented his printing press and fifty years hence there were over twenty million volumes in circulation. Five hundred years before Moore’s law, we had a verifiable explosion of printed information that fueled the Protestant Reformation, powered the rise of the middle class, and drove up literacy rates. Gutenberg gave us big data, Renaissance-style.
Fast forward to present day, there are billions of books in circulation. Yet we don’t obsess about them. We don’t worry about all the books we’re not going to be able to read. Why not? Because they are a part of our daily reality, we know how to interact with them. Pop-up books are for small children, comic books are for adolescents, textbooks are for education, and novels are for recreation. Books are sold in stores and borrowed from libraries. Literacy is widespread. (You are reading this blog, after all.) We don’t obsess about books because they are the water we swim in.
The same will happen in our relationship with big data. We’ll figure out how to interact with it. We’ll build “libraries” to store it and we’ll develop conventions to categorize it. We’ll learn how to benefit from it without worrying about trying to absorb it all. We’ll develop discernment to distinguish between good data and bad data. We’re only about ten years into our modern day Gutenberg revolution. Nobody knows how long this transition will take but one day in the near future, we won’t talk about big data as a thing. It will just be the water we swim in.
In 7th grade, my history teacher gave us an assignment: read a newspaper article and write an essay on the author’s bias. I didn’t realize it at the time but it was a lesson on literacy. True literacy means more than knowing the alphabet, sounding out words on a page, and comprehending a paragraph. True literacy requires us to read through the words, to evaluate the strength of the author’s arguments. It requires us to seek to understand his context and his bias.
Likewise, data literacy is more than just 1’s and 0’s. Being good at math is not enough. Data literacy requires us to seek to understand the sources of data, to evaluate their meaning and intent, and to judge whether an author has accurately incorporated the data to support his point of view. To be illiterate in modern industrial society would be an insurmountable handicap. One day, we will feel the same way about data literacy. When will we know when we’ve achieved it? As in personal hygiene, it seems there are few documented standards. But it’s a safe bet that when we get there, we’ll need to be able to read between the lines.