“Baseball is 90% mental, and the other half is physical.” – Yogi Berra
You just have to love Yogi Berra quotes like this. We all pretty much know what he’s talking about, even if his math is not spot on. It’s a restatement of the Pareto Principle, the 80/20 rule! It applies to just about anything in life or business. If I had to write a definition of it for technical documentation, it would look something like this:
“A situation where eighty percent of events attributed to a group are caused by twenty percent of the members of the group.”
Re-stated as examples:
(And I’m certainly not in that last twenty percent. If I was, I wouldn’t have to write articles like this one!)
Now that we’ve got an understanding of the principle, let’s look at how it can be visualized. Excel has a very simple wizard for creating a Pareto Chart that can be found on the Insert menu:
But we want one of these in Power BI. And Power BI doesn’t have one (yet, maybe later). We’ll need to ‘roll our own’. Let’s discuss the various parts of the chart itself, so we know what we’re shooting for.
The arrow points to the spot on the line where it crosses 80%, in our case, after about the first four members, as can be seen by following the green dashed line from right to left, then down. The first four members would be the ‘twenty percent’ of the Pareto Principle, and their cumulative measure would be the eighty.
Note: Math wizards may point out that four members divided by a total member count of fifteen is closer to thirty percent than twenty, but remember that this is a rule of thumb, and we all know that some thumbs are bigger or smaller than others. To plot some data in a Pareto Chart, we’ll need a couple of pieces of information from it:
Now that we understand what we’re shooting for, let’s get started. If your data includes a running sum of the measurement for each member, sorted by the respective member’s measurement, then you’re golden and can skip to the section titled Add the Grand Total and Running Percent. Your data may include a Ranking column so you may be able to skip the respective steps in each of the following two sections. For the rest of you, keep reading. We’ll look at two approaches to getting the intermediate bits of data: Power Query (M), and DAX.
Let’s start with some simple data in Excel, in fact the same data used to generate the Excel Pareto chart we used to explain the concepts:
We’ll load this data (it’s in an Excel table called “Table1”) and edit it in the Power BI Query Editor. First, we need to sort the data by the [Measure] column, sorted descending. Click the down-arrow next to the Measure column title and select Sort Descending.
Next, on the Add Column menu, select Index Column. Keep the defaults of Starting Index of 1 and Increment of 1.
I renamed my column to [Power Query Rank] to differentiate it from ranking step we’ll introduce in the model later via DAX.
Next, we’ll add the running total as a Custom Column with a formula as shown below:
Hint: If you can’t read the formula from the screen shot, it is:
= Table.Range ( #”Renamed Columns”, 0, [Power Query Rank] )
Attribution should go to Sam Vanga and SQL Server Central for this bit of M code: http://www.sqlservercentral.com/blogs/samvangassql/
The Power Query function Table.Range can be explained like this: Given a table of data, in our case the last of our query steps, a.k.a. #”Renamed Columns”, start at the 0 row (top), and go down the number of rows represented by the value in column [Power Query Rank]. The result is a table associated with each row in the query. The first row of the query has a table with one row of data in it. The second row has a table with two rows, and so forth. This table is represented by the word “Table” on each row of the column we just added.
From here, click on the ‘expand’ arrow in the column header and select the Aggregate radio button, check off the “Sum of Measure” column, and un-check “Use original column name as prefix”:
I renamed the resulting column [Power Query Running Total] (not shown).
Click Close and Apply on the Home menu.
As with all things Microsoft, there is more than one way to accomplish a goal. In our case, the goal is to get the running total, and just like before, we’ll need the ranking first. For this exercise, we’ll be using DAX instead of Power Query, but should get the same results.
Create a column with the formula as follows:
DAX Rank = RANKX (All ( Table1 ), [Measure] )
DAX Running Total =
SUM ( Table1[Measure] ),
ALLSELECTED ( Table1 ),
Table1[DAX Rank] <= MAX ( Table1[DAX Rank] )
This DAX formula does pretty much the same thing as the Power Query Range.Table function above, the only difference is that it includes the aggregate within, eliminating the need for an extra column.
Note: Know the difference between Columns and Measures in DAX. Mistaking the two will cause error, frustration, and hair loss.
Plotting all these columns and measures on a simple table visual shows that Power Query and DAX come up with the same answers for Rank and Running Total, a good sanity check. Also, the ranks are easy to verify as to accuracy, and with a little mental math, running totals are as well. I had to re-format some of the numbers to make them show without decimals.
There’s two more pieces we need: [Grand Total] which is self-explanatory, and [Running Percent], which is the ‘percentage of the [Running Total] compared to the [Grand Total]’. These can only be done in DAX. Add a measure as follows:
Grand Total = CALCULATE ( SUM ( Table1[Measure] ) , ALL ( Table1 ) )
This calculates the Grand Total and makes it available at every slice (row of each Member).
Now add the last item, a column with the expression:
Running Percent = [Power Query Running Total] / [Grand Total]
Running Percent = DIVIDE ( [Power Query Running Total] , [Grand Total] )
Note: The column [DAX Running Total] would work just as well as its Power Query equivalent since we know it has the same number.
Format this last one as a percent.
Now the fun part. For this we’ll need either a “Line and Stacked Column Chart” or a “Line and Clustered Column Chart.” This is the easiest part of the whole exercise:
Like I said, simple if you have all of the data pieces in front of you.
Need help getting the right data pieces? Not sure what charts you can generate from the data pieces you have? There’s probably a way to get to where you want to be. Reach out to our team of data scientists at BlumShapiro Consulting to learn more about how data can help guide your organization into the future.