Category Archives: Blog

Info Vis blog confuses readers with an awful graph

In his June 12, 2012 post titled, “The Rise of Chrome and the Fall of Internet Explorer“,’s author, Matthew Yglesias took some data and made the following chart.

Confusing chart showing browser market share over time

This chart is supposed to show browser market share over time. Instead, it confused his readers to the level that, out of the 55 comments the post received (as of June 14, 2012), more than half were complaints about the bad chart.

Mr. Yglesias’ mistake isn’t uncommon. Excel has made it easy for anyone to create a graph, but Excel hasn’t made it easy for anyone to create a good graph. The Business Intelligence Guru wants all of you graph makers out there to KEEP IT SIMPLE. The only exceptions to the KEEP IT SIMPLE charting rule are for Charles Minard and Amanda Cox, they’ve got the chops to mix it up a little.

With the understanding that simpler is usually better, here’s a simple line chart reinterpreting Mr. Yglesias’ confusing graph. One addition, or subtraction, there’s no need to go out two decimal places on the Y axis. Those extra digits add no value to the graph, in fact, they eat up valuable space and lower the data to ink ratio.

Tableau to the rescue! How to improve Sunlight Foundation’s scatterplot showing that Congress speaks like Juveniles

On June 4th, Stephen Colbert started off his show by discussing a report by the Sunlight Foundation.

The Colbert Report


The report showed that Congress is getting dumber. Ok, that’s not exactly what the report showed, it showed that the speech levels of Congress have been declining since 2005. The Sunlight Foundation’s analysis of Congressional speech included this interesting scatterplot,

Ideology and grade level for Congress

Ideology and grade level for congress

This scatterplot does a few things well. First, it shows us the data. Every point is a current representative. Second, is uses color appropriately, red for Republicans, and blue for Democrats. Third, the fitted lines over grade level of speech add value. They show no correlation for the Democrats and they show a negative correlation for Republicans–that is, the grade level speech of Republicans declines as their voting record becomes more conservative. The scatterplot was made in R. A writeup on how it was made is here.

But the scatterplot also leaves some things to be desired. First off, none of the points are labeled. At the very least the outliers should have labels associated with them. We want to know, for example, who is that red dot speaking 5 grade levels above the average (it’s Dan Lungren)? And who are those dots on the far left and far right of each party? Labeling specific points in R probably isn’t easy. Also, it might be interesting to see if there’s a relationship between grade level speech, ideology, and tenure, so the points should be sized by the number of years in Congress.

After seeing the scatterplot, I wondered what it would look like in Tableau. So I put together the interactive viz below.

While I’m a capable Tableau user, I needed help from Tableau experts to keep the trendlines separate between the two parties. So I reached out to the Tableau Community and got help from Tableau experts Jonathan Drummey who came up with the idea of computing separate trendlines on each viz and then combining the vizs on a dashboard. Shawn Wallwork liked that idea and suggested adding confidence bands to the trendlines. Shawn also added quadrants to each graph, which I think was a brilliant move. I included those quadrants in my viz below. The horizontal sections of the quadrants show us the difference in grade level of speech, with the Democrats speaking at a 11.7 grade level and the Republicans speaking at a 11.2 grade level. Tableau Legend Joe Mako also chimed in with an elegant solution that allowed me to plot both charts with trendlines on a single chart. I think Joe’s solution is great. Having all the data on one chart allows the user to select data across both groups. Had I used 2 separate charts and pieced them together via a dashboard in Tableau, then the user wouldn’t be able to select points on both charts. Thank you Joe, Jonathan, and Shawn (DataViz Dude).

Also, with the Tableau viz the user can hover over a point and see which representative the point is associated with. In addition, the reader can also select a group of points and view the data in tabular format. That’s a really useful feature. Oh, also, Tableau Public (that’s what I’m using to show you the viz) is as inexpensive as R, as in, free.

Tableau is the better tool for this viz. It’s interactive, which gives the reader the ability to explore the data on their own. For example, go ahead and use the slider on top of the viz and exclude all representatives with less than 5 years tenure.

Finally, I’m not making a point here about which is the smarter/dumber party. My personal belief is that, when people are discussing important topics, it’s best to speak clearly and simply.

Bar chart with a log axis, “NEVER”! says the Biz Intel Guru

I’m a big fan of the team at SAS that works on the SG (statistical graph) procedures. Their work enables others to tell richly detailed stories by leveraging SG procedures. The team is led by Sanjay Mantange. Just 3 days ago I attended a session at the SAS Global Forum (SASGF12) where Sanjay spoke about the work he and his team have done for SAS version 9.3. It was obvious from the meeting that Sanjay and his team are incredibly user-focused and are really good at what they do.

So I was surprised today when I read Sanjay’s most recent blog update and saw this chart.

Bar chart with a log axis

There are a handful of ways you can ruin a bar chart. One way is to make them 3D. Why is 3D bad, read this for details. Another way to wreck a bar chart is to start out the numeric axis at a value that isn’t zero. Bar charts are only effective when we can use the length of each bar to make rapid comparisons. If one bar is twice as long as another bar, then we expect the value to be twice as much as the other bar. By starting a bar chart with something other than zero, you are telling a visual lie because we can’t use the length of the bar to compare the magnitude of the differences. When Sanjay created a bar chart with a log axis, he violated the expectation of anyone who reads the chart because we can’t use the length of the bars to directly compare values. A simple table would’ve worked much better. And sorting the table by horsepower would be an even better option, as you can see below.

Table showing horsepower comparison

Horsepower comparision

What Sanjay did came from a good place. He says in his blog post that a few people mentioned to him that they wanted to create a bar chart with a log axis. But just because people want something, doesn’t mean you should give it to them. Sanjay is an expert in his field. Rather than satisfying the customer’s request, he might have offered up a better alternative, like a dot plot.


Better, a dotplot alternative to log scale bar chart

The dot plot doesn’t have the same problem as the bar chart, we’re not comparing lengths of bars, we’re looking at the position of the dot along the X axis. Stephen Few has a great guest post by Info Viz superstar, Dr. Naomi Robbins, about dot plots, and how, in the right circumstances, they can be a great alternative to bar charts. That paper can be found here.

In this instance SAS would’ve better served their customers by offering up the dot plot as an alternative to a log scaled bar chart. As information visualizers, it’s our job to help people see things clearly. It’s not an easy thing to do, but there are consequences when we get it wrong. Those consequences range from wasting people’s time in meetings, to missing important opportunities, to the destruction of the space shuttle challenger and the death of the 7 astronauts aboard (thanks Edward Tufte).

When it comes to creating clear and insightful graphs, the Customer isn’t always right.

So, what do you think? Are there exceptions to the bar chart rules laid out above? Was SAS right in giving the customer what they wanted?

SAS and Twitter–how to harness SAS to grab data from Twitter in 2 easy steps

I recently published a post titled, “4 Key Tweeting Attributes of Guy Kawasaki in one Infographic.” I made extensive use of SAS to gather and manipulate the data from Twitter. Turns out, SAS is pretty awesome for this type of work. In this post I’m going to document how to use SAS to gather data from Twitter’s API. My next post on SAS and Twitter will build off of this one and teach you how to gather data about your subject’s followers, find ReTweets, and listen in on conversations. Click here to get that post delivered to your inbox as soon as it’s published.

First off, you might wonder, why do this? Well, successful analyzers of the future will be adept at analyzing all sorts of data, including data from social networks, like Twitter. Also, if you’re looking to market your analytical skills, what hiring manager wouldn’t be impressed with someone who gathered data from Twitter’s API with SAS, then mined, analyzed, and presented the data in a compelling way. Oh, almost forgot, because you’re analyzing a current event (it’s on Twitter, right?) and mentioning Twitter in your post, your analysis will be more search engine friendly, so you’ll likely get a wider and more targeted audience than if you analyzed something outside of the Twitterverse. Some smart analyzers have even been known to analyze Tweets about their target employer and use the analysis to help get themselves hired. On a larger scale, this is almost exactly what Seth Godin has done with Brands in Public.

Before we get started I have to tell you a little about Twitter’s rate limiting policy. Unfortunately, the search area of Twitter’s API doesn’t have a hard rate limit. Rather, Twitter says they allow a rate limit quite a bit higher than their standard 150 hits/hour, but they decline to say how much. Full documentation can be found here, about 1/2 down the page. I have run afoul of the limit before and guess that it’s around 600 hits per hour or more than 30 per minute. When you exceed the unpublished rate, you have to wait between 1-3 hours for your ip address to be allowed to his Twitter again. If you’re just searching for someone’s post, like we’re doing with Guy Kawasaki, you needn’t worry about getting anywhere near Twitter’s rate limit.

Ok, so now let’s get started.

Step 1:
After you figure out what you want to search for (this site is a good start to find trends, and they graph them out for you), you’ll need to plug your search term into the url string that your SAS program will use. If you’re searching for a person, like I did, your string will look like this:

The ‘q=from’ tells Twitter that you’re searching for Tweets from a specific user. The ‘%3A’ is url encoding for a ‘:’. And the ‘&rpp’ tells Twitter to return the maximum (100) items per page. You can copy and paste that string into your browser right now and get back some nicely formatted xml representing Guy’s last 100 Tweets.

Step 2: Ok, you know what you’re searching for and how to format the url string to get your results. But Twitter returns a paltry 100 results at a time. You’re a SAS user, you don’t work with 100 record data sets! You want more, so you wrap your code in a macro, key off of Twitter’s page= parameter to get older results, and append the new results to your master dataset. Twitter will generally allow you to pull down 1 week’s worth of search results. The code to do this is located here.

That’s enough to get you started. You now have a SAS data set with lots of Twitter data, including text to mine, dates and times to trend out, and, hopefully, an interesting topic to help show showcase your analytical prowess to your audience.

You can access the full code here.

Don’t forget to come back in about 2 weeks to read my post on how to wrangle and append other data from Twitter to your search dataset. Or, better yet, click here and get all of my posts in your inbox as soon as they’re published.

Old Spice Guy’s popularity on Twitter charted

Old Spice recently released about 14 ads with The Old Spice Guy (OSG) personally responding to Tweets from 14 celebrities. Some of the celebs are Hollywood types, others are Web Celebs like Guy Kawasaki, Biz Stone, Kevin Rose. You can see OSG’s video replies here. They are great.

I put together a chart showing the number of Tweets that mention the words ‘old’ and ‘spice’. The chart shows just how quickly the Twitterverse filled up with Tweets about the OSG. Before 9am on July 13th, there was hardly any mention of the OSG, but then, within 6 hours, there’s a spike of about 2,300 tweets per hour about Old Spice. Alas, nothing lasts forever, and after peaking at 4,500 Tweets per hour, the Twitterverse quieted down and settled at around 400 Tweets per hour about the OSG.

BTW, the OSG says he’s hung up his towel.

Chart of the Old Spice Guy's popularity on Twitter

OSG Trend

Thoughts on “7 Rules for Dashboard Design” post on Dashboard Insight

This post addresses a post on Dashboard Insight’s site titled, “7 Small Business Dashboard Design Dos and Dont’s”

Hi Stacey,

I read your post with great interest, after all, who wouldn’t like to know the 7 rules of dashboard design? As soon as I got to the 19th word of your post, “colorful” I knew that we’d have some interesting differences in our viewpoints.

So let’s start with how you define a dashboard. I agree with most of your definition, especially the part about face-smacking insights. Certainly a useful dashboard should provide the reader with insights. But does a dashboard need to have “gorgeous colorful graphs?” I think not. In fact, if the dashboard designer uses too many colors in their graphs, they can kiss those face-smacking insights goodbye. I think it would be better to say that an insightful dashboard needs color use to be, restrained. The designer needs to be sparing with his/her use of color so that when color is used, the reader’s eyes are immediately drawn to the thing that the designer is emphasizing. Use color everywhere and it becomes meaningless.

I agree with your rule #1 (start with a few key business metrics, don’t waste time collecting everything), and whole-heartedly agree with your rule #2 (use basic tools that you already have). In fact, I’d recommend that most people, beginners and experts alike, use the combination of Excel with XLCubed. XLCubed gives the designer the ability to create very small and crisp sparklines, microbarcharts, bulletcharts, and other useful graphics.

Rule #3 could be summed up as two rules. First, use simple line charts. And second, don’t try to compare things (this month to last month) on your dashboard, it’s never a good idea.

I’m all for simple charts, after all, most complicated charts are a result lazy design (exception is Napeloeon’s march). A good dashboard designer takes the time to ensure that his/her charts are almost instantly insightful. Complex, hard to read charts are almost never instantly insightful.
The second part of your rule #3, that is, don’t make two point comparisons, really surprised me. The most insightful dashboards I’ve seen are ones where the reader can instantly see if a critical measure is above or below last month, or last year, or if a measure is over or under forecast. For that, you absolutely have to compare two points. Further, check out Stephen Few’s bullet charts. They’re all about comparing two things, actual to forecast, test to control, this to that, and that to this. In fact, one rule all dashboard designers might want to follow is to ask themselves this question when including a measure on their dashboard, “Compared to what?” I got that one from Stephen Few.

Rule #4 about Pareto charts, I’m just not a big fan. Yes, the 80/20 rule is an important one to know about, but the actual Pareto chart violates your first rule of rule #3, it ain’t that simple. Dual Axes with a line and a bar on the same graph just isn’t all that intuitive. I agree with your second rule of rule #4, no pie charts, although some info viz bloggers are pushing back against that one, like Jorge Camoes here.

Rule #5, don’t go all Picaso on it. Agree. I agree too when you say ‘limit your creativity to use of colours,’ so long as you mean that colours should be used sparingly and to call out the most important things to monitor on the dashboard.

Rule #6, Monitor your dashboard weekly. Agree 100%. Monitor that dashboard. After all, if you don’t check it out to see how things are doing, what good is it?

Rule #7, Get help from someone who’s ok with Excel and charting, not from someone who’s selling you the software. For most BI vendors out there, I agree with your assessment.

I applaud your efforts to put down rules for small business dashboard designers to follow. And your points about well-designed dashboards helping business owners monitor their businesses is right on point.

Lastly, there’s a small and passionate community of dashboard designers on the web. You can find some great advice from Stephen Few’s website. He’s also published 4 excellent books about information visualization, one focusing exclusively on dashboard design.


John C. Munoz

TinkerPlots, data exploration software for kids that’s all grown up.

I was blown away this morning when I watched two short movies about data exploration software called TinkerPlots. The software is marketed to schools for kids grades 4-8. I love the idea that kids in school can get their hands dirty visually exploring data. And I’m even more excited that they have this tool available to them. Why has TinkerPlots flown under our radar for so long? It’s been around for at least 4 years.

The designers of this software deserve praise for creating software that gets out of the way (a Stephen Few-ism, I think) and lets the user explore the data using simple commands. I will happily shell out the $89 to play with TinkerPlots.

Unfortunately, the Tinkerplot website makes it a bit difficult to see examples of the software in action. You can see some quicktime movies showing TinkerPlots at work here and here. Here’s a listing of all TinkerPlot movies.

This software isn’t nearly as sophisticated as some of the software mentioned on Stephen’s site. But, as da Vinci said, “simplicity is the ultimate sophistication.”

Would love to hear your thoughts on this. Is anyone out there using TinkerPlots?

Pie Charts and faulty analytics in the NYTimes? Watch as the Biz Intel Guru fixes a seriously flawed blog post.

“Is Amazon Working Backward?” That’s the title of NYTimes blogger Nick Bilton post on Dec 24, 2009. Mr. Bilton is writing about Amazon’s product, the Kindle. Regarding the Kindle, he writes, “customers aren’t getting any happier about the end product.”

The day Mr. Bilton posted his story, best-selling author Seth Godin poked holes in it. Mr. Godin’s post is titled, “Learning from bad graphs and weak analysis.” Below is a brief listing of the serious flaws in Mr. Bilton’s approach. The listing is a mashup of Mr. Godin’s thoughts and mine.

1. Bilton should know better than to use pie charts because it’s really hard to determine the percentages when we’re looking at parts of a circle. Bar charts would’ve been much better. Stephen Few has stressed this for years. If you’re posting a chart in the NYTimes, you’d better have read your Stephen Few and Edward Tufte.
2. When your charts are the main support for your story, you’d better get them right. Mr. Bilton did get the table of numbers to the left of the pie charts correct. Perhaps he’d be better served by relying on them over the pie charts to make his point.
3. When you’re analyzing something, you shouldn’t compare opposite populations while ignoring their differences.

Mr. Godin cited 4 specific problems with the piece, ranging from the graphs being wrong (later corrected) to Bilton misunderstanding the nature of early adopters. In addition, Mr. Godin writes, “Many of the reviews are from people who don’t own the device.” Obviously, it’s hard to take a review of a Kindle seriously if the reviewer doesn’t own a Kindle. These are the different populations I’m talking about in item #3 above. I’ll address some of Mr. Godin’s concerns with Bilton’s post now and fill in some of the gaps that Godin left to be filled.

Mr. Bilton tried to make the case that each new version of the Kindle is worse than the one before it. His argument is based almost exclusively on the pie charts below, specifically, the gold slices of each pie. The gold slices are the percentage of one star reviews (lowest possible) each Kindle receives.

Here are the original 3 pies that Mr. Bilton showed in his post.

Despite difficulties in estimating the size of each slice in a pie chart, it is apparent that the 7% slice in the first pie chart is much larger than 7%. His corrected version is here.

Another problem Godin has with Bilton’s piece goes to the nature of early adopters. “The people who buy the first generation of a product are more likely to be enthusiasts,” writes Godin. The first ins are more forgiving than the last ins. I can’t really argue with that insight. My brother, an avid tech geek, is an early adopter of lots of tech gadgets. He was the first person I knew to buy an Apple Newton. I don’t recall a single complaint from him about the Newton, despite it not being able to recognize handwriting, which was its main selling point.

Mr. Godin’s claim that many of the reviewers don’t own a Kindle intrigued me the most. If I could quantify the number of one star reviewers who don’t own a Kindle then I could show the difference in one star ratings between the two groups, owners and non-owners.

I recreated the dataset that Mr. Bilton used for his analysis, 18,587 reviews in all. I also read up on how Amazon determines if a reviewer is an “Amazon Verified Purchaser.” Basically, Amazon says that if the reviewer purchased the product from Amazon, they’ll be flagged with the Amazon Verified Purchase stamp. So let’s see, do the one star ratings vary between the Amazon Verified Purchaser reviews compared to the non-Amazon Verified Purchaser reviews? Why yes, they do!

Amazon Kindle one Star reviews

Amazon Kindle 1 Star reviews

It’s clear from these charts that the reviewers who didn’t purchase a Kindle are much more likely to give a one star rating compared to the reviewers who Amazon verified as purchasing the Kindle. With each Kindle release, the non-verified Kindle owners were consistently four times more likely to give a one star review than the Amazon Verified Reviewers—the ones who actually purchased a Kindle. What’s up with that?

Let’s look at the reviews from the verified purchasers. The percentage of one star ratings each new Kindle edition receives doubles from 2% with Kindle 1, to 4% with Kindle 2, and then moves up to 5% with KindleDX. However, this evidence provides very weak support for Bilton’s claim that Kindle owners are getting progressively less happy.

What about the reviewers who are happy to very happy with the Kindle, the four and five star reviewers? Once again, the non-verified Kindle reviewers provide consistently lower ratings than the reviewers who actually own a Kindle. And once again we see the trend of the non-verified reviewers liking each new version of the Kindle less than the previous one. The four and five star ratings for actual owners of the Kindle jibe with Mr. Godin’s claim that the early adopters are more likely to be enthusiasts than those late to the game.

4 & 5 star Amazon Kindle Reviews

Four & five star Amazon Kindle Reviews

So there you have it, Mr. Godin’s hunches are correct!

What’s most interesting to me, though, is the fact that 75% of reviews of the Kindle aren’t made by people who own a Kindle. On my next post on this subject we’ll hear from a good friend of mine, and text mining expert, Marc Harfeld. We’ll mine the text of the 15,000 customer reviews looking for differences in the words used between the verified and non-verified Kindle owners. Perhaps that will shed light on this mystery. We’re also going to weight the reviews by the number of people who told Amazon that they found the review helpful. You’d think that a review that was helpful to 1 out of 3 people is different than a review that was found helpful by 18,203 out of 19,111 people, like this one.

Lastly, we’d love to hear suggestions from you on other next steps we might take with this analysis.

Thanks for reading.

Education Pays, get the facts in this informative graph


Education Pays!


Fact: High school dropouts are three times more likely to be jobless than college graduates.

Fact: People with some college make 50% more than high school dropouts.

Fact: A college graduate earns almost twice as much as a high school graduate.

It pays to get schooled

It pays to get schooled

Click here to download this post as a pdf.

Promising New Visualization Developer’s Took Kit

Of all the open source developer visualization tool kits I’ve seen so far, the one I stumbled upon today (thanks Moritz Stefaner), named Protovis, seems the most practical and easy to use. Protovis comes from Stanford’s visualization group, with the help of Jeff Heer and Michael Bostock.

Below is a screen grab of some of the visualizations created using Protovis.



While there are some graphs in the examples that we might want to stay away from, those radial fan (sunburst?) type charts are just plain confusing, I think the ability to construct high-quality and insightful charts in this software makes it a winner. You can do horizon charts using Protovis. See Stephen Few’s favorable review horizon charts here. You can do sparklines and sparkbars as well. And if you want to put interactive or animated visualizations together, Protovis lets you do that too.

Other developer tool kits I’ve dabbled, namely Flare and Processing, both very powerful and flexible, seem much more difficult to learn than Protovis. And I’ve yet to find a way to generate simple bar and line charts in either Flare or Processing, but that might be just my lack of experience with those tools talking.

What do you think?

More examples and download link for the software at


Reblog this post [with Zemanta]