Last week, our Thunderdome data team launched an interactive database to accompany a group of stories by Viktoria Sundqvist, the investigations editor at Digital First Media’s Connecticut papers. Her project examined the disparity in public school superintendent salaries at every district in the state and looked at the perks that often get bundled into contracts for the school executives. We asked Viktoria and data team developers Peggy Bustamante and Vaughn Hagerty to walk our readers through the reporting and data collection, as well as the building of the map and database framework that sprang from the stories.
Originally, the project started out small – I was just gathering some of the local contracts, because we had several new superintendents. Then the idea grew into “Why don’t we do the entire state?” As my editor said, “Connecticut is a pretty small state, so you can actually do that.”
More than half of the districts responded to our Freedom of Information requests quickly. Only two school districts made us pick up the contracts in person and charged us for it, which we noted in a story we did about all the responses.
I started a Google spreadsheet to keep track of the basic info we were looking for – salary for current year, next year, third year, vacation time, mileage, other benefits, etc. A big struggle was making sure all the numbers collected actually added up, because some of the contracts only reported salary (including tax-sheltered annuity), while others split this into several sections of a contract.
These two passages excerpted from the contracts we examined are representative of why parsing them was, at times, a confusing process:
“The Board shall contribute Fifteen Thousand Dollars ($15,000.00) to the Superintendent’s base annual salary; from which total base salary the Superintendent shall pay, through a reduction in his base salary (elective deferral) the amount of Fifteen Thousand Dollars ($15,000.00), to an annuity designated by the Superintendent.”
“The superintendent’s salary consists of a) base component of $162,000, b) an additional amount of 4% of base salary, c) An additional sum which is equal to 7.25% of the total of the amount specified in sections (a) and (b) above.”
Once the spreadsheet was almost complete, I teamed up with the Thunderdome data crew, who said they could build off the spreadsheet I had already done and make it look really cool. We just needed to clean up some numbers… One thing I learned is that you cannot have numbers and letters in the same field (I had made notes next to salaries, or put a range for some, or put vacation days carried over in addition to other vacation time, and mixed annual mileage stipends with just “IRS rate” which did not fly for the searchable database and all had to be adjusted).
Because I had randomly decided to include the school district size from the beginning of my project (thank God!), I could later compare the superintendents by district size. I was also able to add a column for cost per student, where I let the spreadsheet divide the superintendent’s salary by the number of students in the district.
After Viktoria and I discussed the data she had gathered and what she envisioned for presenting it, we settled on a searchable/sortable table with a linked map that would use graded colors to show salary ranges around the state.
To create the map, I needed to get the boundary/shapefiles for the Connecticut school districts prepped and ready to merge with the spreadsheet of data. I downloaded the shapefiles from the Census Bureau website and converted them to KML using Shape to Fusion. I then merged Viktoria’s spreadsheet with the table of KML files using Fusion Tables.
The initial pass resulted in an incomplete map, full of holes where districts presumably should be. There were three folders in the Census zip file I downloaded and I had assumed the “unified school districts” shapefiles meant all of the school districts. Never assume. I had to send all three (elementary, secondary and unified districts) through shpescape.com, combine them all into one file, and then delete some overlapping districts. Other bug-inducing factors were variations in district naming between the Census files and Viktoria’s file, such as “Regional district 06” in one was “Region 6” in another. Another glitch in building a complete map was that one district had multiple polygons, which did not import smoothly and had to be mucked with to add it to a row in the Fusion Table.
After all these mapping bugs were addressed, we debated whether there wasn’t one more little district missing at the top of Connecticut. We turned to Google Maps to discover that no district was missing, it was just Massachusetts jutting into Connecticut in an odd little keyhole formation along Connecticut’s northern border. Who knew?
Once the mapping issues were resolved, we pulled the data into a table/map framework that Vaughn Hagerty had built for a previous project. Some of the data was not showing up properly because there were irregularities in the data types, some text in cells that were designated for numbers (e.g. salaries). The solution was for Viktoria to remove the text from the cells and incorporate comments into the introduction and footnotes.
The code is definitely still in development, as we hone it and add features and, eventually, include a configuration wizard. Although we’re not really ready to release the code, which is still a bit rough, I can talk a bit about what it does. It depends on open-source libraries such as backbone.js, underscore.js, handlebars.js, jQuery, the Google Chart Tools API (for tables and graphs) and either Google Maps or the Leaflet OpenStreetMap API. We hope to add D3.js support for more sophisticated graphs and already have cribbed some functions from that great library to handle comma-separated (CSV) values.
Here’s how it works in a nutshell. There are two pieces that need to be provided to thunderdome.js:
- Data. So far, we’ve used comma-separated data, straight JSON or, in the case of this salary project, a feed from Google Fusion Table as initial data sources. More are planned soon. Ultimately, that data is reformatted and fed to backbone.js, which allows us to do things such as provide default values. (Note to the geeks out there: We’re just barely scratching the surface in terms of backbone integration.) Essentially, this data is a key-value store, similar to a database table or an Excel spreadsheet, which allows us to reference specific pieces, just like in a database.
- Configuration. Want a map? Tell thunderdome.js what kind (Google or OpenStreetMap), what styling you want, whether you want to display pins or polygon shapes on the map, how you want to differentiate among pin or polygon colors, what fields you want in a window that pops up when users click on a pin or polygon … and some other stuff, including formatting certain fields correctly as numbers (with commas) or currency and legends on the map. Want a table? Specify the fields you want to show, whether you want to search on a field or filter by one, whether you want a map popup to fire when a user clicks on a certain record, how many records you want to show on each page … and some more stuff, including the number formatting mentioned above.
- Finally, you need some simple HTML and CSS to hold and style the pieces on a web page and to specify how you want certain visual elements such as your map popups or legends to look.
Note that almost all of these items are a la carte: You can just display a map or a just a table and you can pick and choose which features you want.