Data Sources
Where our data comes from, how we collect it, and what we track.
What We Collect
For each race result, we capture the following data points:
Race Metadata
- Event name & date
- Race type (heat, semi, final, etc.)
- Distance (meters)
- Conditions (good / poor)
Team Results
- Finish position
- Finish time (when available)
- Team name & country
- Division (scholastic / youth)
Regatta Coverage
We aim to cover every major scholastic and youth rowing event globally. Current coverage includes:
United States
- Stotesbury Cup Regatta
- SRAA National Championship
- USRowing Youth National Championship
- Regional Youth Championships (Mid-Atlantic, Central, Southeast, Southwest, Midwest)
- NEIRA Championship
- PSRA league races
- San Diego Crew Classic
- Local and invitational regattas
United Kingdom & International
- Henley Royal Regatta (Princess Elizabeth Challenge Cup)
- Schools' Head of the River Race
- National Schools Regatta
- International invitationals
How Data Enters the System
Race results are ingested through multiple channels:
- Manual Entry
Administrators enter results directly through the admin dashboard, with structured forms for each race.
- PDF Extraction
Official result PDFs from regattas are uploaded and parsed using AI-powered document processing to extract structured data.
- Web Scraping
Links to online result pages are processed to extract and structure race data automatically.
- Community Contributions
Approved contributors can submit results for review. All submissions go through admin verification before entering the database.
Data Quality
Every result is validated before it affects rankings. We check for duplicate entries, verify team names against our canonical database, and flag statistically improbable results for manual review. When finish times are available, we cross-check against known course records and typical race durations to catch data entry errors.