Tag Archives: data visualization

Bar chart with a log axis, “NEVER”! says the Biz Intel Guru

I’m a big fan of the team at SAS that works on the SG (statistical graph) procedures. Their work enables others to tell richly detailed stories by leveraging SG procedures. The team is led by Sanjay Mantange. Just 3 days ago I attended a session at the SAS Global Forum (SASGF12) where Sanjay spoke about the work he and his team have done for SAS version 9.3. It was obvious from the meeting that Sanjay and his team are incredibly user-focused and are really good at what they do.

So I was surprised today when I read Sanjay’s most recent blog update and saw this chart.

Bar chart with a log axis

There are a handful of ways you can ruin a bar chart. One way is to make them 3D. Why is 3D bad, read this for details. Another way to wreck a bar chart is to start out the numeric axis at a value that isn’t zero. Bar charts are only effective when we can use the length of each bar to make rapid comparisons. If one bar is twice as long as another bar, then we expect the value to be twice as much as the other bar. By starting a bar chart with something other than zero, you are telling a visual lie because we can’t use the length of the bar to compare the magnitude of the differences. When Sanjay created a bar chart with a log axis, he violated the expectation of anyone who reads the chart because we can’t use the length of the bars to directly compare values. A simple table would’ve worked much better. And sorting the table by horsepower would be an even better option, as you can see below.

Table showing horsepower comparison

Horsepower comparision

What Sanjay did came from a good place. He says in his blog post that a few people mentioned to him that they wanted to create a bar chart with a log axis. But just because people want something, doesn’t mean you should give it to them. Sanjay is an expert in his field. Rather than satisfying the customer’s request, he might have offered up a better alternative, like a dot plot.


Better, a dotplot alternative to log scale bar chart

The dot plot doesn’t have the same problem as the bar chart, we’re not comparing lengths of bars, we’re looking at the position of the dot along the X axis. Stephen Few has a great guest post by Info Viz superstar, Dr. Naomi Robbins, about dot plots, and how, in the right circumstances, they can be a great alternative to bar charts. That paper can be found here.

In this instance SAS would’ve better served their customers by offering up the dot plot as an alternative to a log scaled bar chart. As information visualizers, it’s our job to help people see things clearly. It’s not an easy thing to do, but there are consequences when we get it wrong. Those consequences range from wasting people’s time in meetings, to missing important opportunities, to the destruction of the space shuttle challenger and the death of the 7 astronauts aboard (thanks Edward Tufte).

When it comes to creating clear and insightful graphs, the Customer isn’t always right.

So, what do you think? Are there exceptions to the bar chart rules laid out above? Was SAS right in giving the customer what they wanted?

200+ things you need to know about unemployment in the US, all presented on one insightful dashboard

There are 208 charts on the dashboard below. Each one is loaded with information from the Bureau of Labor statistics. Check it out, you’re bound to learn something you didn’t know before you came here.

The unemployment insight dashboard is now updated with May’s unemployment figures from the BLS. The unemployment rate dropped from 9.9% to 9.7%, in part due to the fact that approximately 200,000 people stopped looking for work and stopped being counted by the BLS as unemployed.

The long-term unemployment population, those out of work for 6 months or more, grew by an additional 47,000 people and account for 46% of all unemployed. That’s the equivalent to all the people (men, women, and children) in the entire state of Washington.

Note: click the picture below to bring up a large version. Then click again to get a crystal clear look at the dashboard.

Dashboard of Joblessness in the U.S.-May 2010

What everybody ought to know about unemployment in the U.S.

The unemployment insight dashboard is now updated with April’s unemployment figures from the BLS. While the unemployment rate is essentially unchanged, the nasty trend in the long-term unemployment continues.

The numbers for April show the long-term unemployed group grew by another 200,000. Now, more than 6.7MM Americans, that’s the equivalent to the entire state of Washington (men, women, and children), have been jobless for more than 6 months. This population now accounts for 46% of all unemployed.

Also, if you’re wondering why the unemployment rate increased despite the fact that the number of people who found new jobs increased, a good explainer can be found here, at the WSJ blog.

In my update last month I said I’d try to get more insights about the long-term unemployed. It turns out there’s a fair amount of information for this group, but the data are updated annually, not monthly. Nonetheless, in the coming weeks I will generate some supplemental posts analyzing the long-term unemployed from the new found data. Until then, here’s a link to a story about the long-term unemployed in the Huffington Post.

I welcome your comments, both positive and negative. I especially want to hear your thoughts on improving this dashboard. In particular, I’m considering getting rid of and/or dramatically altering the bar chart on the left side of the dash showing the number of un/underemployed Americans. I think the scaling of the chart makes differences in the blue bars hard to pick up, I also don’t like the lack of context in the chart. Perhaps indexing it to 1 year ago might be better.

If you’d like to print out or save a copy of a beautiful, high-res, 11 x 17 pdf version of this dashboard, just click here.
Dashboard of Joblessness in the U.S.-April 2010

Unemployment Insight Dashboard for March 2010 shows troubling trend in long-term unemployed

This month’s update shows continued growth in the long-term unemployed population who now number 6.5MM. That’s an all-time high and equivalent to the entire population of Arizona (men, women, and children) being out of work. What will it take to start seeing reductions in this group? Have their jobs disappeared for good? In the coming weeks I will try to answer these important questions by working with the Bureau of Labor Statistics to see if I can get more granular data about this population. Until then, you can find a good story about the long-term jobless here.

In addition, the industry section on the bottom right side of the dashboard shows many industries reversing the trend of increasing weekly work hours. For the last couple of months this section has been filled with blue bars showing growth, but this month, most of the bars are gray, showing contraction in average weekly hours of production. What does this mean?

I welcome your comments, both positive and negative. I especially want to hear your thoughts on improving this dashboard. In particular, I’m considering getting rid of and/or dramatically altering the bar chart on the left side of the dash showing the number of un/underemployed Americans. I think the scaling of the chart makes differences in the blue bars hard to pick up, I also don’t like the lack of context in the chart. Perhaps indexing it to 1 year ago might be better.

If you’d like to print out or save a copy of a beautiful, high-res, 11 x 17 pdf version of this dashboard, just click here.

Dashboard of Joblessness in the U.S.-March 2010

My Unemployment Dashboard ranks #1 in Google search. See why.

It’s taken me about 5 months to get there, but thanks to your help, my award-winning Unemployment Dashboard ranks #1 according to Google. I also rank #1 for the words, ‘Unemployment Insights’. I suspect the postings featuring my dashboard over at chartporn and vizworld have helped quite a bit. Thanks Chartporn & VizWorld! BTW, if you haven’t checked out either site, you should, they are filled with many excellent visualizations.

There’s a lot going on this month in the industry index section on the bottom right hand side of the dashboard. Many industries are seeing strong and continued growth in the amount of hours their employees are working. Expansion in these figures is a leading indicator of hiring down the road. In addition, to provide you with a bit more context, I’ve expanded the time horizon on the industry index section from the last 6 months to the last 12 months.

Something else I noticed this month is the divergence between the unemployment rate and underemployment rate (upper left-hand chart). The unemployment rate held steady, but the underemployment rate rose 1.8%, from 16.5% to 16.8%. That’s an additional half-million underemployed Americans. If you’d like to bone up on the difference between underemployment and unemployment, check out this link.

One other point worth noting is this month’s decrease in the percentage of long term unemployed. For only the second time this year, the percentage of those unemployed for more than 6 months fell. Granted, the decrease was very small, .3% or 50,000 workers.

I welcome your comments, both positive and negative. I especially want to hear your thoughts on improving this dashboard. In particular, I’m considering getting rid of and/or dramatically altering the bar chart on the left side of the dash showing the number of un/underemployed Americans. I think the scaling of the chart makes differences in the blue bars hard to pick up, I also don’t like the lack of context in the chart. Perhaps indexing it to 1 year ago might be better.

If you’d like to print out or save a copy of a beautiful, high-res, 11 x 17 pdf version of this dashboard, just click here.


Dashboard of Joblessness in the U.S.-Feb 2010

Unemployment insight dashboard for Jan 2010 shows 41% of all unemployed, 6.3MM people, out of work more than 6 months.

My dashboard of unemployment in the U.S. is updated with data from January 2010.

I’ve added sparklines to the Demographic section in the middle of the dashboard. Now, rather than just seeing where unemployment stands this month for a particular demographic segment, you can see where it’s been over the last 12 months. The sparklines on the left side of the demographic section all represent the unemployment rate over the last 12 months. The sparklines on the right show the percentage of total unemployed each segment represents over the last 12 months. For example, a value of 25% in the “% of total unemployed” column for the “White Women” segment means that White Women make up 25% of all unemployed.

It’s interesting to see the huge drop, 9.5%, in underemployment this month. Check out the small chart in the upper right hand side of the dashboard. We went from 9.1MM underemployed in Dec to 8.2MM in Jan 2010. That’s the first big drop in underemployment in at least 12 months.

The long-term unemployed population, however, continues to grow. Another 200,000 Americans added to the pile of 6.1MM Americans who’ve been jobless for more than 6 months. Those long-term unemployed are equal to the entire population of Tennessee. The NYTimes had an indepth story on the long term unemployed, they call them “The New Poor.” You can find that story here.

New data from the Bureau of Labor Statistics will be released this Friday, the 5th, so subscribe to my blog and you’ll get an email notifying you when the revised dashboard is complete.

As always, your feedback is welcome.

Click on the image to enlarge.

If you’d like a beautiful 11 X 17, crystal clear pdf of my dashboard, click here.

BTW, this dashboard was done using Excel 2007.

Dashboard of Joblessness in the U.S.-Jan 2010

TinkerPlots, data exploration software for kids that’s all grown up.

I was blown away this morning when I watched two short movies about data exploration software called TinkerPlots. The software is marketed to schools for kids grades 4-8. I love the idea that kids in school can get their hands dirty visually exploring data. And I’m even more excited that they have this tool available to them. Why has TinkerPlots flown under our radar for so long? It’s been around for at least 4 years.

The designers of this software deserve praise for creating software that gets out of the way (a Stephen Few-ism, I think) and lets the user explore the data using simple commands. I will happily shell out the $89 to play with TinkerPlots.

Unfortunately, the Tinkerplot website makes it a bit difficult to see examples of the software in action. You can see some quicktime movies showing TinkerPlots at work here and here. Here’s a listing of all TinkerPlot movies.

This software isn’t nearly as sophisticated as some of the software mentioned on Stephen’s site. But, as da Vinci said, “simplicity is the ultimate sophistication.”

Would love to hear your thoughts on this. Is anyone out there using TinkerPlots?

Have bad graphs and faulty analysis led to evidence that Amazon has fake reviewers? Read on…

In my first post about Nick Bilton’s flawed analysis of the Amazon’s Kindle I left a few questions unanswered. One of those questions had to do with the ratings of the reviewers themselves. Since Amazon allows each review to be rated by anyone, it might be interesting to see if the number of people who found a review useful varied by the number of stars the reviewer gave to the Kindle. So I ran an analysis examining Kindle 2 reviews.

So here are 4 plots*. The first shows all reviews. Along the horizontal axis is the number of people reported to have found the review useful. Along the vertical axis is the star rating of the review. The plot on the upper right shows the same distribution, but for non-verified purchasers of Kindle2 only. The plot on the lower left shows the same distribution, but this time for reviewers who Amazon said actually purchased a Kinde2. The plot on the lower right brings the Amazon verified and Amazon non-verified purchasers together. Each red + sign is an Amazon Verified purchaser and each blue circle is a non-verified purchaser.

Four scatterplots

Evidence of fake reviews?

These four charts tell us an interesting story. Each point on the chart represents a review. So in each chart (except on the bottom right**) you’re seeing 9,212 points. The two charts on top are roughly the same. That’s because the first chart shows all reviews and the second one shows just the reviews submitted by non-verified Kindle2 purchases. You may recall that 75% of the reviews on the Kindle2 were submitted by people who Amazon said didn’t buy a Kindle2. So those dots dominate the charts. But take a look at the chart on the bottom left. You’ll notice that the cluster of reviews at the bottom of top two charts, the ones between 1 and 2 stars and stretching out all the way to the end of the X axis are gone. We knew that the non-verified purchasers were four times more likely to give a one star review compared to a verified purchaser, but we didn’t know that the 1 star non-verified reviewer were getting lots of people finding their reviews useful.

This dynamic really pops in the bottom right hand chart, the one with the red and blue lines in it. The blue line is made up of non-verified purchasers. As the number of people who said they found the review useful increases (starting around 8), the line dives down towards the 1-2 star ratings. The downward slope of the curve for the verified purchasers is much, much gentler.

This is a bit of a head-scratcher. I’ve heard people say that Amazon is full of fake reviews. These people aren’t saying that Amazon is the one doing the faking, but people who have some product that competes against the product being reviewed, or just people with an axe to grind. Is this an example of that? Do the fakers get their friends to say that their reviews are helpful? Maybe the Kindle2 verified purchasers post reviews that people just don’t find helpful. Right now, I don’t know what the correct answer is. But I have a feeling that some intelligent text-mining of the data will help flesh out an answer. Be on the lookout for a post about just that topic, by Marc Harfeld, coming soon, right here.

*To make the graphs easier to decipher I’ve excluded any review with more than 50 people finding the review useful. Taking the horizontal axis beyond 50 makes the plot very difficult to read. In all, this amounts to excluding 92 reviews out of the 9,304 I have gathered on the Kindle2. Because the star ratings are integers between 1 and 5, I needed to introduce a random jitter to the points (1 star becomes 1.221, another 1 star becomes 1.1321) so that they wouldn’t completely overlap each other on the scatterplot. I did the same to the values of how many people found each review helpful.
**Please note, to make an apples to apples comparison for chart on the bottom right, I had to reduce the number of non-verified reviewers down to the same number of amazon-verified reviewers. The sampling was a simple random sample, so it did not distort the distribution.

The Best insights into December’s Unemployment figures updated Now in this award winning information dashboard.

Last month the BLS revised their November 2009 unemployment number from a loss of 11,000 jobs to a gain of 4,000. That’s the first monthly gain in two years.

But December’s data came in showing job losses of 85,000, with the official unemployment rate holding steady at 10%. The underemployment rate moved up slightly, from 17.2% in Nov. to 17.3% in Dec. The most disturbing trend I’m seeing in the numbers is the long-term unemployed, those people out of work for more than 27 weeks. This group of out of work Americans now accounts for 40% of all unemployed people , or 6.1 million people. This group has grown by 135% in the last 12 months. Getting these long-term unemployed back to work is going to take a very long time.

Download a beautiful, high-resolution 11 X 17 pdf version here.

Dec 2009 dash-large

Dashboard of Unemployment in the U.S.

Pie Charts and faulty analytics in the NYTimes? Watch as the Biz Intel Guru fixes a seriously flawed blog post.

“Is Amazon Working Backward?” That’s the title of NYTimes blogger Nick Bilton post on Dec 24, 2009. Mr. Bilton is writing about Amazon’s product, the Kindle. Regarding the Kindle, he writes, “customers aren’t getting any happier about the end product.”

The day Mr. Bilton posted his story, best-selling author Seth Godin poked holes in it. Mr. Godin’s post is titled, “Learning from bad graphs and weak analysis.” Below is a brief listing of the serious flaws in Mr. Bilton’s approach. The listing is a mashup of Mr. Godin’s thoughts and mine.

1. Bilton should know better than to use pie charts because it’s really hard to determine the percentages when we’re looking at parts of a circle. Bar charts would’ve been much better. Stephen Few has stressed this for years. If you’re posting a chart in the NYTimes, you’d better have read your Stephen Few and Edward Tufte.
2. When your charts are the main support for your story, you’d better get them right. Mr. Bilton did get the table of numbers to the left of the pie charts correct. Perhaps he’d be better served by relying on them over the pie charts to make his point.
3. When you’re analyzing something, you shouldn’t compare opposite populations while ignoring their differences.

Mr. Godin cited 4 specific problems with the piece, ranging from the graphs being wrong (later corrected) to Bilton misunderstanding the nature of early adopters. In addition, Mr. Godin writes, “Many of the reviews are from people who don’t own the device.” Obviously, it’s hard to take a review of a Kindle seriously if the reviewer doesn’t own a Kindle. These are the different populations I’m talking about in item #3 above. I’ll address some of Mr. Godin’s concerns with Bilton’s post now and fill in some of the gaps that Godin left to be filled.

Mr. Bilton tried to make the case that each new version of the Kindle is worse than the one before it. His argument is based almost exclusively on the pie charts below, specifically, the gold slices of each pie. The gold slices are the percentage of one star reviews (lowest possible) each Kindle receives.

Here are the original 3 pies that Mr. Bilton showed in his post.

Despite difficulties in estimating the size of each slice in a pie chart, it is apparent that the 7% slice in the first pie chart is much larger than 7%. His corrected version is here.

Another problem Godin has with Bilton’s piece goes to the nature of early adopters. “The people who buy the first generation of a product are more likely to be enthusiasts,” writes Godin. The first ins are more forgiving than the last ins. I can’t really argue with that insight. My brother, an avid tech geek, is an early adopter of lots of tech gadgets. He was the first person I knew to buy an Apple Newton. I don’t recall a single complaint from him about the Newton, despite it not being able to recognize handwriting, which was its main selling point.

Mr. Godin’s claim that many of the reviewers don’t own a Kindle intrigued me the most. If I could quantify the number of one star reviewers who don’t own a Kindle then I could show the difference in one star ratings between the two groups, owners and non-owners.

I recreated the dataset that Mr. Bilton used for his analysis, 18,587 reviews in all. I also read up on how Amazon determines if a reviewer is an “Amazon Verified Purchaser.” Basically, Amazon says that if the reviewer purchased the product from Amazon, they’ll be flagged with the Amazon Verified Purchase stamp. So let’s see, do the one star ratings vary between the Amazon Verified Purchaser reviews compared to the non-Amazon Verified Purchaser reviews? Why yes, they do!

Amazon Kindle one Star reviews

Amazon Kindle 1 Star reviews

It’s clear from these charts that the reviewers who didn’t purchase a Kindle are much more likely to give a one star rating compared to the reviewers who Amazon verified as purchasing the Kindle. With each Kindle release, the non-verified Kindle owners were consistently four times more likely to give a one star review than the Amazon Verified Reviewers—the ones who actually purchased a Kindle. What’s up with that?

Let’s look at the reviews from the verified purchasers. The percentage of one star ratings each new Kindle edition receives doubles from 2% with Kindle 1, to 4% with Kindle 2, and then moves up to 5% with KindleDX. However, this evidence provides very weak support for Bilton’s claim that Kindle owners are getting progressively less happy.

What about the reviewers who are happy to very happy with the Kindle, the four and five star reviewers? Once again, the non-verified Kindle reviewers provide consistently lower ratings than the reviewers who actually own a Kindle. And once again we see the trend of the non-verified reviewers liking each new version of the Kindle less than the previous one. The four and five star ratings for actual owners of the Kindle jibe with Mr. Godin’s claim that the early adopters are more likely to be enthusiasts than those late to the game.

4 & 5 star Amazon Kindle Reviews

Four & five star Amazon Kindle Reviews

So there you have it, Mr. Godin’s hunches are correct!

What’s most interesting to me, though, is the fact that 75% of reviews of the Kindle aren’t made by people who own a Kindle. On my next post on this subject we’ll hear from a good friend of mine, and text mining expert, Marc Harfeld. We’ll mine the text of the 15,000 customer reviews looking for differences in the words used between the verified and non-verified Kindle owners. Perhaps that will shed light on this mystery. We’re also going to weight the reviews by the number of people who told Amazon that they found the review helpful. You’d think that a review that was helpful to 1 out of 3 people is different than a review that was found helpful by 18,203 out of 19,111 people, like this one.

Lastly, we’d love to hear suggestions from you on other next steps we might take with this analysis.

Thanks for reading.