:: 06-Nov-2000 07:18 (Monday) ::
I just realized that I hadn’t posted an explanation for what exactly
happened to stats. Here’s the rundown:
The OGR statsrun aborted Thursday when the audit script detected some data
inconsistencies. This script makes sure that the total amount of work
reported for a given project matches in a bunch of different tables. (For
anyone interested in the gory details, see
http://cvs.distributed.net/cvsweb.cgi/stats-proc/daily/audit.sql).
The script was indicating a disparity between how much team work was in
the master table, the team members table, and the team ranking table. It
turns out that a number of records (3000 is the number I recall) had
vanished from the table that contains team information (the team name,
contact person, motto, password, etc.) This missing information meant that
a lot of work didn’t get credited to teams for that day.
Once I found this out, I shut off http access. I didn’t know why these
records went *poof*, but I wasn’t going to rule out bad hardware, and if
it was bad hardware, there’s no telling what other damage could occur
while we were pushing the box.
I pulled in a copy of this table from a backup only to discover that the
backup was incomplete/incompatible. The backup had been done using a
Microsoft version of the BCP tool, talking to our Sybase database. The
BCP out to a file (the backup) seemed to work fine, but on this table, it
didn’t like working in the other direction. Before everyone assumes this
is Microsoft’s fault, keep in mind it could just as easily be an issue
with Sybase. Also, Nugget originally used a Microsoft BCP tool to get the
data into Sybase, and it worked fine back then.
Although the backup wasn’t what it should be, it did contain 99% of the
missing info. While working to find out what teams were still MIA, I
discovered some issues with some of the team data in the master tables
not matching the information in the table that tracks the history of what
team everyone has been on. I had to correct that issue before I could
accurately determine what teams were still missing.
Once that was done, I determined there were still 154 teams missing records.
Bringing in an older copy of that table brought the MIA number down to
124. We’re looking to see if there’s any more copies of that table tucked
away someplace. In the meantime, I re-created records for all the teams
that were missing them.
I should point out that no actual stats data is missing; the audit script
would have caught it if that was the case. This will only affect the
appearance of some team’s team summary pages.
I’ve been unable to determine exactly why tally dropped some data on the
floor. I’ll be working with the person who is colo-ing the box for us to
see if we can discover any hardware related issues.
The database is currently being backed up (using Sybase’s BCP utility),
and as soon as that’s finished and I’ve verified that the backups are
good, I’ll start processing logs again. We’re 5 days behind right now, so
it will probably take about a day to catch up, assuming I leave http access
turned off. I hope to have everything back to normal by Wednesday.
Thanks (as always) for your patience.