Below is a data set with 4 groupings of data and 2 columns for each grouping. The summary statistics–mean, variance, correlation, sum of squares, r², and linear regression line are the same for all 4 groupings of X and Y values. If we stopped our analysis here we could move forward confidently knowing that the 4 groups of data are the same. And we’d be dead wrong.
In my 15 years in analytics I’ve seen good analysts, time and again, stop their analytical efforts when their data summaries don’t tell a compelling story. I’ve sat through hours of meetings, going through page after page of data related to critical financial forecasts, looking at historical trends going back years, without seeing a single graph to show a trend. For whatever reason, data exploration for many analysts starts and ends with a table of summary statistics describing the data. What a shame. In relying on summary statistics we give short thrift to one of our most powerful assets–our eyes.
To see what I mean, click here.
For years Edward Tufte and Stephen Few have been telling the BI community to, “above all else, show the data”. Make your intelligence visible. Go beyond the summary look of your data and show it, warts and all. In fact, the Business Intelligence Guru recommends looking at graphic representations of your data before you even look at summary statistics. There are tools available today that make looking at graphic distributions of data easier than ever. I have years of experience using JMP (link will take you to a fully functional 30 day free trial), from SAS, which has a distribution engine that makes it a snap to look at distributions. Even SAS graph, with its new statistical graph (sg) procedures in version 9.2 make it a snap to view your data up close and personal.
Lastly, I didn’t invent the data that I’m using to make my point. I came across two references last week that made me think that I should write about it. I watched an info viz legend, Jeff Heer, tell a story making the case for info viz. I didn’t realize it then, but that story he told actually dated back to 1973 and also appeared on the first page of Chapter 1 in Edward Tufte’s book, “The Visual Display of Quantitative Information” published in 2001. The story goes to the heart of why we need to show the data.
The credit for this eye-opening example goes to F.J. Anscombe, a statistician who created this data set in 1973 to make the case for graphing data before analyzing data. He was a man ahead of his time.