Sort Performance Improved and now Stable
Posted by Sean Forman on November 3, 2010
2010 Major League Baseball Standard Fielding - Baseball-Reference.com.
This is really down in the weeds, so you if you aren't worried about nuts and bolts about this works, feel free to move on.
Chris Jaffe noticed that after the switchover last week the sorting of tables was no longer stable. For large tables like the one linked above (think multi-hundred rows), the sorting performance had been struggling, so I implemented a quicksort algorithm in javascript since you can't be certain what sorting algorithm the browser uses.
By default, quicksort is non-stable. Meaning that if you sort on one column and then sort on another the previous sorting is tossed out (in effect) when sorting on the new column. With a stable sort, if you sorted on errors and then games played and looked at all of the players with 159 games they would still be sorted by errors. A nice feature to have.
So I found a nice implementation of stable quicksort and now the sorts are stable.
One other improvement is that we were using parseFloat and some other checks to determine the value of a numerical table cell. We have to do 1,000's of such checks each pass through the table for large tables and that was really inefficient, so now I just assume that the number is a float if I have good reason to believe it is, but items like salaries will still go through parseFloat so they will sort correctly.
So now, the sorting is faster, but you still will see a pause as manipulating the DOM can be slow depending on your browser.
BTW, in a quick check of my browsers,
Google Chrome is twice as fast sorting our tables as FireFox and four times faster than IE. If you want a browser that is really, really fast at rendering pages and modifying pages (sorting, dragging, etc), you really can't beat Chrome. It is easily my browser of choice now.
November 3rd, 2010 at 2:14 pm
This might be the right place to mention a glitch that has affected these fielding stats. For multi-team / multi-position players, the Totals are frequently wrong.
To illustrate this, look at the link to the 2010 MLB Standard Fielding Stats that you provided here. If you scroll down to the first multi-team player (Rick Ankiel), you'll find that his Totals match the sum of his KCR and ATL numbers. This is apparently because he played exclusively CF for both teams.
Now look at the second multi-team player (Joaquin Arias). His Totals amount to less than the sum of his TEX and NYM numbers. This is apparently because he played multiple positions for each team and, for some reason, only his games at 2B and SS were included in the Totals, while his games at 1B and LF were not included.
I don't know what causes this, but is there a fix?
November 4th, 2010 at 12:05 pm
It appears that the 1B and LF totals were not added in because Arias played those positions for only one of the two teams he played for, therefore his personal page was not programmed to compute a Total for those positions, therefore the composite MLB page had no Totals for those positions to draw from. Definitely still a matter that should be addressed in order for the MLB stats to be accurate.