Home / Data Sources

Data Sources

Where our data comes from, how we collect it, and what we track.

200
Teams Tracked
125
Races Recorded
672
Results Logged

What We Collect

For each race result, we capture the following data points:

Race Metadata

  • Event name & date
  • Race type (heat, semi, final, etc.)
  • Distance (meters)
  • Conditions (good / poor)

Team Results

  • Finish position
  • Finish time (when available)
  • Team name & country
  • Division (scholastic / youth)

Regatta Coverage

We aim to cover every major scholastic and youth rowing event globally. Current coverage includes:

United States

  • Stotesbury Cup Regatta
  • SRAA National Championship
  • USRowing Youth National Championship
  • Regional Youth Championships (Mid-Atlantic, Central, Southeast, Southwest, Midwest)
  • NEIRA Championship
  • PSRA league races
  • San Diego Crew Classic
  • Local and invitational regattas

United Kingdom & International

  • Henley Royal Regatta (Princess Elizabeth Challenge Cup)
  • Schools' Head of the River Race
  • National Schools Regatta
  • International invitationals

How Data Enters the System

Race results are ingested through multiple channels:

  • Manual Entry

    Administrators enter results directly through the admin dashboard, with structured forms for each race.

  • PDF Extraction

    Official result PDFs from regattas are uploaded and parsed using AI-powered document processing to extract structured data.

  • Web Scraping

    Links to online result pages are processed to extract and structure race data automatically.

  • Community Contributions

    Approved contributors can submit results for review. All submissions go through admin verification before entering the database.

Data Quality

Every result is validated before it affects rankings. We check for duplicate entries, verify team names against our canonical database, and flag statistically improbable results for manual review. When finish times are available, we cross-check against known course records and typical race durations to catch data entry errors.

← Methodology← What is Elo?