Skip to content


Coding corner

I ran in the Beat the Bridge 8k today, and I was getting frustrated waiting for the online results to get posted. So I fired up a little script.

#!/usr/local/bin/perl

use strict;
use warnings;

while (1) {
    eval {
        system ("wget 'http://onlineraceresults.com/race/view_race.php?race_id=14261' -O race");
        my $results = system ("grep 'There are currently no results posted for this race' race");
        if ( $results ) {
            system("/usr/bin/mail -s 'Race results' xxxxxxxxxx\@tmomail.net < /home/zachmu/results") and die "can't send mail";
            exit 0;
        }
    };
    sleep 60;
}

By the way, if you know my mobile phone number, you can use the email address template above to send me text messages via email. Please don’t script this.

Anyway, the above didn’t work because of a subtle bug. See it? I didn’t, and so I didn’t get a text when the results were posted. Turns out that wget -O file won’t overwrite an existing file, so after the first one it was a no-op. Grr. Always read your man pages! This can be fixed with a little call to rm inside the loop.

Next I wanted to see how I did compared to the rest of the pack, and wanted to view a histogram of times. The online results site doesn’t support such a thing, of course, so I ginned something up:

#!/usr/local/bin/perl

use warnings;
use strict;

use Data::Dumper;

my $seenDiv = 0;
my $runner;
my $cell = 0;
my $runners = [];
my $cellProcessors = [
                     simpleFieldExtractor('link'),
                     simpleFieldExtractor('first'),
                     simpleFieldExtractor('last'),
                     simpleFieldExtractor('division'),
                     simpleFieldExtractor('place'),
                     simpleFieldExtractor('divplace'),
                     simpleFieldExtractor('genderplace'),
                     timeExtractor('guntime'),
                     timeExtractor('time'),
                     timeExtractor('pace'),
                     ];

while (<>) {
    my $line = $_;
    chomp($line);

#    print "DEBUG $line\n";

    if (!$seenDiv && $line =~ m/DIVISION:/) {
        $seenDiv = 1;
    } elsif (!$seenDiv) {
        next;
    }

    if ($line =~ m/tr class/) {
        $runner = {};
        push @$runners, $runner;
    } elsif ($line =~ m/td class.*>(.*)<\/td>/) {
        $cellProcessors->[$cell]->($runner, $1);
        $cell = ($cell + 1) % scalar @$cellProcessors;
    } elsif ($line =~ m/block-footer/) {
        last;
    } elsif ($line =~ m/<\/tr>/) {
        $cell = 0;
    }
}

my @filteredRunners = grep { defined $_->{'time:sec'} } @$runners;
$runners = \@filteredRunners;

analyze();

sub analyze {
    my @sortedByTime = sort {
        $a->{'time:hrs'} <=> $b->{'time:hrs'}
        || $a->{'time:min'} <=> $b->{'time:min'}
        || $a->{'time:sec'} <=> $b->{'time:sec'}
        || 0;
        } @$runners;

    my $bucket = 0;
    my $bucketCnt = 0;
    my $STEP = 60;
    foreach $runner ( @sortedByTime ) {
        next if (not defined $runner->{'time:sec'});

        my $totalSec = $runner->{'time:hrs'} * 3600
            + $runner->{'time:min'} * 60
            + $runner->{'time:sec'};
        if ($bucket == 0 || $totalSec > $bucket + $STEP) {
            use integer;
            my $label = "";
            $label .= $bucket / 3600;
            my $min = ($bucket % 3600) / 60;
            $min = "0$min" if ($min < 10);
            $label .= ":" . $min;
#            $label .= ":" . $bucket % 60;
            print "$label:";
            for (my $i = 0; $i < $bucketCnt / 10; $i++) {
                print "*";
            }
            print " $bucketCnt\n";
            $bucket = $totalSec - ($totalSec % $STEP);
            $bucketCnt = 1;
        } else {
            $bucketCnt++;
        }
    }
}

#print Dumper $runners;

sub simpleFieldExtractor {
    my $fieldName = shift;
    return sub {
        my ($runner, $field) = @_;
        $runner->{$fieldName} = $field;
    };
}

sub timeExtractor {
    my $fieldName = shift;
    return sub {
        my ($runner, $field) = @_;
        my ($hrs, $min, $sec) = split(/:/, $field);
        if (not defined $sec) {
            $sec = $min;
            $min = $hrs;
            $hrs = 0;
        }

        $runner->{"$fieldName:hrs"} = $hrs;
        $runner->{"$fieldName:min"} = $min;
        $runner->{"$fieldName:sec"} = $sec;
    };
}

When you feed this the race results page, it spits out the following histogram. Each asterisk represents 10 finishers with the time indicated, discarding seconds.

0:24: 2
0:25: 7
0:26:* 14
0:27:* 12
0:28:* 12
0:29:** 20
0:30:** 28
0:31:*** 38
0:32:**** 41
0:33:***** 51
0:34:******** 83
0:35:********* 90
0:36:************ 128
0:37:************* 137
0:38:****************** 180
0:39:********************* 216
0:40:********************* 210
0:41:********************** 228
0:42:************************ 243
0:43:***************************** 296
0:44:************************* 257
0:45:*********************** 238
0:46:****************** 187
0:47:********************** 221
0:48:************************ 241
0:49:******************** 209
0:50:******************* 192
0:51:*********************** 231
0:52:******************* 190
0:53:****************** 187
0:54:************ 129
0:55:*************** 151
0:56:************ 126
0:57:*********** 111
0:58:********* 98
0:59:******* 77
1:00:********* 97
1:01:****** 66
1:02:***** 55
1:03:***** 54
1:04:****** 66
1:05:*** 36
1:06:** 27
1:07:*** 32
1:08:** 27
1:09:* 17
1:10:* 18
1:11:** 26
1:12: 8
1:13: 7
1:14: 8
1:15: 8
1:16: 6
1:17: 9
1:18: 8
1:19: 7
1:20: 1
1:21: 3

So, my 37 minute time puts me well above the modal hump there. That’s what I wanted to know!

I should be able to use these same tools on other race results posted on that site, provided they don’t change their format significantly and break my screen scraping. Provide an API, you yokels! I hereby release the above software into the public domain, so if you’re of the running and coding persuasion feel free to use it!

Posted in Coding.


4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Bryan says

    Look at that Gaussian distribution! So beautiful. My students rarely do what I say, but they never fail to Gaussian-distribute themselves. It always weirds me out, too — why do so many random phenomena cluster around the mean?

    Really cool script, by the way. I see you still haven’t switched to Python. :)

    • Zach Musgrave says

      What’s interesting to me is the bimodality of the distribution. I did some more analysis, and it seems like it cannot be explained by gender. The male and female distributions each have their own bimodality, with two peaks around 6 minutes apart. My current hypothesis is that the two superimposed distributions must be “serious” vs. “charity” runners.

      I’m working on a new version of this using pretty graphical charts, so I’ll know more soon!

  2. Matt Blancarte says

    I’m not competent in Perl, but that’s an impressive script you busted out there. Nice result for an 8k run, too.

    Stumbled across your blog last night, and it kept me reading until 3am. Love the blog, Zach!

  3. state abbreviations says

    good idea im gonna try it



Some HTML is OK

or, reply to this post via trackback.