It is perhaps axiomatic that a good performance handicapping system must be based on reliable and useful data. In the myriad criticisms of PHRF data quality is seldom mentioned as a factor to consider. It is however the foundation on which the system is built. Build a handicapping system on a faulty foundation and the result is a faulty handicap. Let’s look at the data generated in a typical regatta or club beer can race.
When race results are posted, competitors flock to the scoreboard to see how they finished. The resulting numbers, first, second, third, etc. are ordinal numbers. Ordinal numbers carry little information about the race or its competitors; it simply declares the order in which the boats finished in a particular race. It says nothing about how well a boat sailed compared to other boats in the race. Yes, ordinal numbers do establish who beat whom; however, the distance (in time or space) between boats is not captured. The delta between first and second may not be the same between second and third or third and fourth, and so on.
For many sailors and especially one-design sailors this information is sufficient. Add up the points associated with each finishing place and the regatta or series winner is decided. A major shortcoming of ordinal numbers is that they offer no insight into how a boat may perform in the future, how it performed in relation to other boats, or how it performed in relation to its past races. A well found handicapping system needs data that allows for these comparisons. Ordinal data is simply inadequate for handicapping purposes.
More useful information for handicapping purposes is always provided at PHRF races, i.e., elapsed time. Elapsed time is interval data; the time between each second/minute/hour is precisely the same. By comparing elapsed times the relative speed and performance of one boat can be compared to another. If in a race the first place boat finished sixty seconds ahead of the second place and the second place boat finished thirty seconds ahead of the third place boat, we can use this data to begin to make inferences about the relative speeds of the different boat/crew combinations. The simplest is that the second and third place boats are more similar than the first and second or first and third because the elapsed time delta between second and third is smaller than between the other pairs. These comparisons can be the foundation of a very crude handicapping system by adjusting the elapsed times, adding time to the faster boats or subtracting time from the slower boats.
This approach has significant limitations in that the elapsed time is dependent on the boat, the crew, the weather, and the distance raced. At least two of which are not consistent between races. Clearly a twenty-minute delta between first and second on a Volvo Ocean Race leg is very different from a twenty-minute difference on a Wednesday evening beer can race.
To summarize, ordinal data is inadequate for developing a handicapping system because it does not contain sufficient information to make valid comparisons between boats necessary for a handicapping system. Interval data, derived from elapsed times offers promise, if variables of distance and weather can be factored into the rating.
Another way to report race data is in ratio form. A ratio carries more information than either ordinal or interval data and can be used to make comparisons between different races and between different boats. The simplest ratio would be comparing each boat’s elapsed time to the fastest boat. If the winning boat completed the course in 50 minutes, the second place boat in 55 minutes and the third place boat in 60 minutes, the resulting ratios (in decimal form) would be 1.1 and 1.2 or 110% and 120% of the time required for the winning boat to complete the race. From these numbers we can deduce the finish order and the relative speeds of the competing boats.
Developing a ratio based on the fastest boat has a couple of limitations. First the first place boat is not constant, i.e., a different boat could win each race changing the standard against which relative boat speed is measured. Second, the distribution of the resulting ratios is one-tailed; the range of ratios would go from 1 to infinity. For reasons that will become clear later, it would be better to measure relative speed from a midpoint, such that some ratios would be less than 1 and some greater than 1.
Ideally, the distribution of ratios should be evenly distributed, with half the ratios greater than 1 and half less than 1. Calculating the ratios based on the median elapsed time will allow this as the median is defined as the point in a distribution where half the numbers are above and half below that point. Using the earlier example (and assuming a 3 boat race) the fist place boat would have a ratio of .91, the second a ratio of 1, and the third 1.09. From these ratios it the finishing places of the boat can be deduced as well as the relative speeds of the boats regardless of any other factors such as race course, boat type, weather, crew and so on.
Collecting ratio data over the course of several races can provide valuable information to the fleet and to the individual skippers. Because the data is based on the relative performance of any one boat to the fleet and because the data is not dependent on extraneous factors the boat’s overall performance can be discerned. Is there a trend in the ratio data? Are a boat’s ratios getting smaller? If so the boat’s performance is improving. Getting larger? Something’s happening, as the boat is performing worse over time.
This kind of data can help skilled sailors fine tune their program and assess crew changes, equipment changes or other skipper controlled factors. For the novice sailor who is finishing at the back of the fleet, monitoring ratio data can demonstrate improvement, as the weekly ratios get smaller. Being able to quantify and demonstrate improvement can be a powerful motivator and help to offset discouragement that sets in while watching transoms on the racecourse. These analyses are impossible using only ordinal or interval data.
More importantly, ratio data derived from actual race performance lays the foundation for a fair data based handicapping system. More on that next week.