How we did it: “Predict the playoffs” interactive

Thunderdome Deputy Sports Editor Bobby Bonett, Data Developer Peggy Bustamante and Graphics Editor Nelson Hsu collaborated on our NFL playoffs interactive, a project that tried to give readers a new way to explore the stats behind the post season.

Bobby Bonett explains the vision he brought to the data team:
The idea behind the playoff predictor was to create a data-driven interactive that let fans play with the stats to suggest which team could win each playoff match-up, based on historical performances in the playoffs. In order to appeal to fans beyond the statistically obsessed, we chose 10 standard statistics for the computations.These 10 stats for each NFL team that made the playoffs this year were then compared to the statistics of every playoff team since 1971. (We gathered these statistics from pro-football-reference.com.)

interactive

As teams were defeated each week, the app updated to whittle down to the final matchups and, ultimately, the Super Bowl.

The 2013 data was compared to the historical data in two different ways. For pass offense, pass defense, rush offense, rush defense, points for and points against, team statistics were compared to the league average and converted to a percentage. For example, if a team passed for 4,000 yards and the league average was 5,000 yards, that team would have a value of 80 percent. This normalized the data to account for changes in football over the years, specifically the shift from rush-heavy offenses of the past to today’s pass-heavy offenses.

Then, the 2013 teams were matched up in each category to similar playoff teams since 1971. “Similar” was defined as within 4 percent in a certain category for the divisional round, and within 8 percent in the championship round. For outliers, the ranges were expanded to ensure there was a significant sample size. We then looked at the record of those similar teams, and assigned a percentage chance that the 2013 team would win based on historical results. For example, if nine of the 20 similar teams in “pass offense” won their playoff games, the 2013 team would have a 45 percent chance of winning based strictly on their pass offense.

For turnover margin, record, seed and home/away values, teams were matched up with other teams that had the same values. The data was again normalized for turnover margin and record to account for seasons that didn’t have a 16-game schedule. A percentage chance of winning was then assigned to each team in these values the same way.

Using this data, we devised a formula that would produce a percentage chance of each team winning each match-up based on user-selected and user-ranked statistics. For instance, a user could select pass offense, rush offense and points in a particular order, and the formula would weigh the three statistics accordingly for each team, computing a percentage chance of each team winning the game.

With that framework in mind, Peggy Bustamante describes the app’s development:
Bobby, Nelson and I hashed out the project, discussing not only layout but also what we wanted the functionality to be, and so we mocked up a look for the project on the dry erase wall at Thunderdome.

While Nelson constructed the HTML and CSS, Bobby and I figured out the formula to predict the percentage chance of each team winning its match-up. It needed to allow users to select the statistics they wanted to consider and how heavily those statistics would be weighted, using a sortable stack on the left. (Nelson used JQuery UI sortable action to create that functionality.)

Nelson left the HTML team data blank so the slots could be dynamically populated from a Google spreadsheet, and updated easily each week by the sports desk without reworking the base code each time. This will also allow for re-use for other competitions, like the MLB playoffs.

We used two Google spreadsheets: one for the front end to populate the team stats and one for the back end to house the data that would be used to calculate the win-percentage prediction.

For the front end, I first imported the data from the Google Speadsheet in JSON format and then re-loaded the data into a JSON object with keys of my own choosing. Having that control over key names became vital for the back-end portion. I then created loops to dynamically update the text in the HTML:

for (var i=0; i<matchupsLen; i++) {
    $("#mod" + i + " p.expertText").html(allData[i * 2][0].expert);
    $("#mod" + i + " .datetime").html(allData[i * 2][0].date + " <br />" + allData[i * 2][0].time);
    $("#mod" + i + " .line").html("<span class='sideHeader'>Line</span><br />" + allData[i * 2][0].line);
    …
    …
}

The code for the back end data calculations was a bit more complicated.

We needed a formula that would allow users to filter and weight the statistics by the categories they choose.

This is the final formula that Bobby created:

(a)[b-(10-c)]+(a1)[b1-(10-c)]… / [b-(10-c)]+[b1-(10-c)]…

a = stat (displayed in a percentage chance of winning, ie. Team X has a 40% chance of winning based on pass-offense)
b = weight (the weight of the chosen statistics from the left rail; ie one stat chosen is worth 10, two stats are 10, 9, three stats are 10, 9, 8, etc.)
c = number of teams ranked

The rationale for the (10-c) portion of the formula was to properly calculate the weighting of each statistic. For example, if three items were chosen, we wanted them to be weighted 3a, 2b and 1c, not 10a, 9b and 8c.

The result of the formula would be a percentage chance of each team winning each match-up. However, this data would be calculated for each team independent of its opponent in its match-up. So the final step was to relate the two opponents’ percentage chances of winning out of 100 percent.

The formula for that:

x/(x+y)  =   z/100

x = Team 1’s independent chance of winning
y = Team 2’s independent chance of winning
z = Team 1’s chance of winning compared to Team 2’s chance of winning, as a percentage

For the back-end, we agreed that the win probabilities would be calculated any time the user changes the order of the statistics, or checks or unchecks a statistic. A listener is set to capture those changes and set off the calculation process, which has two steps.

When a change occurs, the doneSort() function is triggered:

function doneSort() {

    var sortedIDs = [];

    $("#sortableStats input:checked").each(function(i) {
         sortedIDs[i] = $(this).attr('id');
    });
    calculateWin(sortedIDs);
}

This captures the ids of all the items that are selected, in order, and drops them into an array. That array is then passed to the calculateWin() function, which steps through Bobby’s formula for each team matchup.

calculateWin() first checks if the array is empty (no stat items checked) and zeroes out the percentages if that’s the case.

As with the front-end, I loaded the data needed to make the calculations from a Google spreadsheet and then put the data into a JSON object. For the back-end data, it was necessary to match the keys to the #ids in the sortable items on the left, so that I could match them after doneSort() captured all the ids:

HTML:

<div><input type="checkbox" id="a0" onchange="cbChanged(this, 0);"><p>Pass Offense</p></div>

JSON:

for (var i=0; i<dataLen; i++) {
    var counter = gsData.feed.entry[i].gsx$id.$t;

    calcData[counter] = [ {
        myid: gsData.feed.entry[i].gsx$id.$t,
        matchup: gsData.feed.entry[i].gsx$matchup.$t,
        team: gsData.feed.entry[i].gsx$team.$t,
        a0: gsData.feed.entry[i].gsx$passo.$t,
        a1: gsData.feed.entry[i].gsx$passd.$t,
        a2: gsData.feed.entry[i].gsx$rusho.$t, 
        …
        …
    }];     
}

The actual calculation is fairly straightforward, reconstructing the formula using variables and arrays.

Once the selected items are captured, the array took the selected values from the Google Spreadsheet JSON data (calcData), multiplied them by the weight [in this case 5 for the a9 stat, 4 for a7 stat, 3 for a8 stat, etc.], and then added them to the variable summing all the totals.

for (var j=0; j<statIDsLen; j++) {
    statsTeam1 += parseInt(eval("calcData[k][0]." + statIDs[j])) * weight;
    statsTeam2 += parseInt(eval("calcData[k+1][0]." + statIDs[j])) * weight;
    statsDivisor += weight;
    weight--;
}

I also totaled just the weights (5 + 4 + 3 + 2 + 1) into the variable “statsDivisor” to get the percentage for each team.

temp1perc = statsTeam1 / statsDivisor;
temp2perc = statsTeam2 / statsDivisor;

Then to get the final percentages for the teams to add up to 100:

team1winperc = Math.round((100 * temp1perc) / (temp1perc + temp2perc));
team2winperc = Math.round((100 * temp2perc) / (temp1perc + temp2perc));

And the final step was to drop the percentages into the HTML elements:

$("#mod" + i + " .team1 .numBox").text(team1winperc + "%")
$("#mod" + i + " .team2 .numBox").text(team2winperc + "%")

As a result, each time a user made any action on the left, the interactive updated each team’s chances of winning.

And, in several cases, we heard that this encouraged users to play with the statistics until their favorite team was favored to win their match-up.

Tom Meagher

By Tom Meagher

Tom Meagher is the data editor at Thunderdome, leading a team of journalist-developers who build interactive web applications, support computer-assisted reporting projects in local newsrooms and offer training in data analysis and visualization.

Leave a Reply