Home / Methodology

Methodology

A transparent look at how the Rowing Power Index is calculated, adjusted, and simulated.

Overview

The RPI is a modified Elo system designed specifically for multi-team rowing. Every crew starts at a base rating of 1500. After each race, ratings are updated based on performance relative to competitors, adjusted by a series of contextual factors. The system processes all races chronologically to produce a current snapshot of every team's strength.

Rating Updates

For each race, every pair of crews is compared. The rating change for a team is the sum of all pairwise adjustments:

delta = K x MoV x Impact x Conditions x Distance x (S - E)

Where S is the actual outcome (1 for win, 0 for loss in each pairing), E is the expected outcome from the logistic function, and each multiplicative factor adjusts the update magnitude.

K-Factor by Race Type

The K-factor controls rating volatility. Different race types warrant different sensitivity:

Race Type	K	Rationale
Head-to-Head	24	Direct matchups are highest signal
Semi-Final	24	Strong signal from knockout rounds
Final / Championship / Head Race	20	Definitive competitive events
Heat / Regatta / Scrimmage	16	Standard competitive default
Time Trial	10	No direct racing, lower reliability

Adjustment Factors

Margin of Victory (MoV)

MoV = ln(|margin| + 1) x (2.2 / (eloDiff x 0.001 + 2.2))

Logarithmic compression prevents blowout results from having outsized influence. The autocorrelation term reduces the effect when a large favorite wins by a large margin (this was expected), while amplifying upsets.

Regatta Importance Tiers

Tier 1: 1.5x (Henley, Stotesbury, Nationals) | Tier 2: 1.2x (NEIRA, Regionals) | Tier 3: 1.0x (Regular) | Tier 4: 0.8x (Scrimmage)

Events are classified into four tiers based on prestige and competition level. Championship-level events like Henley and Stotesbury get 1.5x weight. Major regional events get 1.2x. Regular season regattas get standard weight. Scrimmages and time trials get reduced weight.

Conditions

Good: 1.0x | Poor: 0.5x

Races in poor conditions (heavy wind, rough water) produce less reliable results. We halve the rating impact to prevent noisy data from distorting rankings.

Distance

Factor = 1 - (distance - 2000) / 10000, clamped to [0.5, 1.0]

The standard racing distance is 2000m. Non-standard distances (sprint courses, head races) receive a proportional reduction. A 1500m sprint gets ~0.95x; a 5000m head race gets ~0.7x.

League

Youth: 1.1x | Scholastic: 1.0x

A modest boost for Youth division races, reflecting the generally stronger competition level from club programs that recruit nationally.

Pairwise Comparison Model

Unlike traditional Elo which compares a team against the field average, the RPI uses a pairwise model. For N teams in a race, we generate N×(N-1)/2 head-to-head comparisons. Each pair produces a win/loss outcome, and the expected score is calculated using the standard Elo logistic function.

proximity_weight(i, j) = 1 / (1 + |pos_i - pos_j| - 1)

Pairs are weighted by position proximity: adjacent finishers (1st vs 2nd) receive full weight, while distant pairs (1st vs 6th) receive proportionally less. This ensures that close battles have more influence than distant comparisons. The total pairwise delta for each team is normalized by (N-1).

Field Strength Multiplier

The quality of competition matters. Results against strong fields count more than results against weak fields:

fieldStrength = 0.8 + 0.4 x (avgFieldRPI - 1400) / 300, clamped [0.8, 1.4]

A race where the average competitor has an RPI of 1400 gets 0.8x weight. A field averaging 1700 gets 1.2x. This prevents teams from inflating their rating by racing only weak competition.

Recency Decay

Recent results are more indicative of current ability than older ones. The RPI applies an exponential decay to historical snapshots:

weight = 0.95 ^ weeksSinceRace

A result from 10 weeks ago has approximately 60% of the weight of a result from today. This prevents stale results from dominating a team's current rating while still preserving historical context.

Season Carry-Forward

At the start of each new season (September 1), ratings are regressed toward the baseline:

newRPI = 0.30 x previousRPI + 0.70 x baseline

By default, 30% of the previous season's rating carries forward. This reflects the reality that scholastic and youth teams turn over significantly year to year, while preserving some program-level continuity. The carry-forward percentage is configurable by administrators.

Performance Scores

When finish times are available, we calculate a normalized performance score for each crew. Times are normalized to a 2000m equivalent, then scored relative to the median time in the race. This provides the margin-of-victory data used in the MoV multiplier. When times are not available (position-only results), the MoV multiplier defaults to 1.0 and only win/loss outcomes affect ratings.

Championship Simulations

We run Monte Carlo simulations(5,000 iterations per championship) to estimate each crew's probability of winning major events. The process:

Field selection — Filter to crews that have historically participated in each championship (or are eligible by country/division).
Bracket or heat generation — Different championships use different formats (Henley uses knockout brackets, Stotesbury uses heats/semis/finals).
Race simulation — Each matchup is resolved using win probabilities derived from current Elo ratings with random noise.
Aggregation — Championship win counts are tallied across all iterations to produce win probabilities.

Known Limitations

Crew changes within a season are not modeled. If a team replaces rowers mid-season, the rating still reflects the composite performance.
New teams start at 1500 regardless of program history. It takes 5-10 races for ratings to converge on true ability.
Season carry-forward is set to 30% by default. This may be too high or too low depending on the level of roster turnover at a given program. Administrators can tune this parameter.
Lane bias and course-specific effects are not currently modeled.
The importance tier system uses event name matching. Events with non-standard names may not receive the correct tier classification.

← What is Elo?Data sources →