staff blogs

distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.

1999-12-07

decibel [07-Dec-1999 @ 05:36]

Filed under: Uncategorized @ 05:36 +00:00

:: 07-Dec-1999 05:49 (Tuesday) ::

“Uhoh.”

– Number one thing you don’t want to hear a stats person say while
testing new statsrun code

For all you stats-junkies that noticed, stats was down for several
hours. I had changed the queries that insert new participants into
the main participants table in the interest of cleaner and faster
code.

Unfortunately, what should have been a relatively simple change didn’t
work. The net result was that some 13,000 emails were added to
STATS_participant even thought they were already in the table. This is
a BadThing(tm).

To make matters worse, while deleting the duplicated records, I ‘went a
bit too far’, and deleted new rc5-64 participants from yesterday (57
people all told). I was able to add these records back in, but any changes
made to these records during the day yesterday (such as joining a team,
setting up your friends, adding a motto, etc.) were lost.

Anyway, everything is pretty much back in one piece again. Sorry for all
the cowfusion!

On a different note, as Nugget mentioned, we found out what was slowing
statsbox to a crawl before. It was a change made right before the slowdown
that really shouldn’t have had the impact that it did. The only thing we
can figure is that the change was just enough to cause sybase to start
turning over it’s internal cache at a very rapid rate. Another BadThing(tm).

Bruce, Nugget, and I came up with a plan for fixing this on a more long-term
basis, which we should be implementing sometime this week. Once we do
implement that change, we should be able to make some other changes that
should speed up the statsrun.

RC5 stats are now running, and I’ve got apache disabled to speed them up as
much as possible. Stats should be back up in an hour or two.

Thanks as always for your patience.

1999-11-24

decibel [24-Nov-1999 @ 15:28]

Filed under: Uncategorized @ 15:28 +00:00

:: 24-Nov-1999 15:30 (Wednesday) ::

Here’s some more info on what we’re seeing on statbox.

The problem that we’re running into is that as soon as we enable any
httpds, Sybase’s CPU load goes through the roof. With 5 httpds running,
we’re seeing CPU usage of 90% on a dual PII-300. Access to the database
is slow as hell, either from the web or locally.

The interesting thing is that this suddenly started after importing the
04 or 05 rc5 log yesterday. After doing some cleanup, the CSC run last
night was fine until the 07 CSC log, when things ground to a crawl. We
were seeing BCPs of ~50rows/sec, where they’re normally 1000s of
rows/sec.

So, the only theory I can think of is that the database has grown to a
size where we’re suddenly overflowing some cache or memory structure.
Unfortunately, sp_sysmon is of little value because it won’t run except
under the lightest of loads. With anything close to a normal load, it
just sits there, never returning from the call. When we do run it, we’re
seeing rather odd results, such as:

Task Context Switches Due To:
Voluntary Yields 10.0 1809.0 1809 47.3 %
Cache Search Misses 1.0 176.0 176 4.6 %
System Disk Writes 0.0 0.0 0 0.0 %
I/O Pacing 0.0 0.0 0 0.0 %
Logical Lock Contention 0.0 0.0 0 0.0 %
Address Lock Contention 0.0 0.0 0 0.0 %
Log Semaphore Contention 0.0 0.0 0 0.0 %
Group Commit Sleeps 0.0 0.0 0 0.0 %
Last Log Page Writes 0.0 0.0 0 0.0 %
Modify Conflicts 118.4 21323.0 21323 557.9 %
I/O Device Contention 0.0 0.0 0 0.0 %
Network Packet Received 1.5 265.0 265 6.9 %
Network Packet Sent 2.4 439.0 439 11.5 %
SYSINDEXES Lookup 0.0 6.0 6 0.2 %
Other Causes -112.1 -20196.0 -20196 -528 %

1999-11-23

decibel [23-Nov-1999 @ 06:09]

Filed under: Uncategorized @ 06:09 +00:00

:: 23-Nov-1999 06:13 (Tuesday) ::

Well, bad news…. statsbox is looking rather unhealthy. It started acting
a little flaky during the CSC run, then when it hit the -04 logfile during
the rc5 run, it ground to a halt. Nugget shut down apache, and all was good
again.

Then, we discovered that it’s going to end up importing 2 of today’s (Nov. 23)
logfiles. This is bad.

Neither of us is going to stay up to babysit the run (damn day-jobs!), so we’ll
have to deal with it tommorow.

The box is also acting rather unhealthy again, so there’s probably something
funny going on… we’ll keep everyone informed as we know more.

Sorry for all the hassles.

decibel [23-Nov-1999 @ 01:12]

Filed under: Uncategorized @ 01:12 +00:00

:: 23-Nov-1999 01:15 (Tuesday) ::

Well, thanks to several sharp-eyed participants, we discovered
that the CSC logs were accidentally included in the RC5 stats-run
last night. This is obviously a BadThing(tm). So, we get to re-
process yesterday’s rc5 data in tonight’s run.

We also discovered another error that would have prevented a very
small number of people from searching for their stats using
psearch.php3. We’ve also fixed that.

What this all boils down to is that stats will be late tonight.

Sorry for the cowfusion.
Moo!
dB!

1999-11-21

decibel [21-Nov-1999 @ 01:41]

Filed under: Uncategorized @ 01:41 +00:00

:: 21-Nov-1999 01:50 (Sunday) ::

Man, talk about being in the hot-seat! :)

For those who noticed, sorry that stats were about an hour late
tonight. I made some extensive changes to the CSC daily routine,
and it took some time to work the final bugs out. The good news
is that the script it now capable of handling multiple projects.
This means that instead of having several different import scripts
for different projects like we do now, we’ll soon be able to run
a single set of scripts to update any project. This will make
maintenance much easier, since we only need to change one script.

Of course, Murphy had to have his fun, so it took me several tries
to get everything working right. All is fine once again, and the
rc5-64 update is churning happily away (though it’s still using
the old scripts).

Before we can completely switch over to the single set of scripts,
we’ll have to rename some tables, which will also affect the PHPs,
so it’ll be a little while before that happens. In the meantime
though, I intend to take a crack at speeding up some of the SQL
routines that happen during the update. I also want to get things
setup so that instead of importing all of the log files during the
stats-run, they’ll get imported during the day as the logs come in.
This should save ~4 minutes for CSC, and ~32 minutes for rc5-64.

1999-11-17

decibel [17-Nov-1999 @ 06:22]

Filed under: Uncategorized @ 06:22 +00:00

:: 17-Nov-1999 06:29 (Wednesday) ::

Ok, for all of you who’ve been asking when we’ll have CSC stats,
I’m working on it. I’d like to put stats for all the contests into
the same set of tables for simplicity sake (and sanity sake on our
end), but this is gonna take some thinking. It also doesn’t help
that Nugget’s busy putting his new house together. Yeah, we should
have looked into this sooner, but I’ve been busy dealing with new
DSL (I wouldn’t want to even think about making all these updates
over my 56k dialup!), and nugget’s been in ‘house mode’ for quite
some time as well.

So, please have patience… I’m hoping that these changes will take
less than a week *crosses fingers*.

Thanks for your cycles and patience!

1999-10-10

decibel [10-Oct-1999 @ 07:43]

Filed under: Uncategorized @ 07:43 +00:00

:: 10-Oct-1999 07:48 (Sunday) ::

Just a quick update. As you might notice, I just updated the ‘Project’
portion of this plan, so now you all know most of what I’m working on.

Thanks to dbaker’s work, rc5@lists.distributed.net is working once
again, so feel free to start sending messages again. It’s quite dead
right now (not that I’m complaining }:8P ).

Stats-box is definitely looking happy again. Stats-run is down to just
over 3 hours. We’re working on getting a few more drives in the box,
which will allow us to spread the database over several different
drives. This should make a substantial performance improvement, since
the box is very I/O bound. More info as the situation unfolds…

1999-10-02

decibel [02-Oct-1999 @ 05:37]

Filed under: Uncategorized @ 05:37 +00:00

:: 02-Oct-1999 05:45 (Saturday) ::

Two quickies (I sound like CmdrTaco (http://CmdrTaco.net) from Slashdot
(http://slashdot.org) now):

First, the stats are currently set to start the stats-run at 1:45GMT
instead of the normal 0:30GMT or so. I’m guessing that this is just a
mistake, but I don’t want to screw with it while Nugget’s out of town,
so it’ll stay this way until at least Sunday night.

Second, I changed the countries page so that it now uses height and width
tags on all the little flag images. This means that the page will display
before all the images are loaded. It’s pretty neat to watch all the flags
pop onto your screen in a random order… of course, if you have a fast
link you won’t see the effect very well, but I guess that’s the trade-off
you’ll have to live with. :)

Unfortunately, I didn’t get the code changed before the stats-run was over,
so the main countries page will be lacking the new tags until tomorrow
night. If you just can’t wait that long, you can see what I’m talking about
at http://stats-decibel.distributed.net/rc5-64/countries.html .

1999-09-29

decibel [29-Sep-1999 @ 03:26]

Filed under: Uncategorized @ 03:26 +00:00

:: 29-Sep-1999 04:00 (Wednesday) ::

Well, our luck just seems to be against us as far as stats-box goes. As Nugget
mentioned in his .plan, we’ve tried a few more things, and they don’t seem to
be working. And right now, I can’t get to the Sybase online manual, so I can’t
really try anything new right now. Fear not though, we still have a few more
tricks up our sleeves (my current theory is that Sybase is very unhappy that
we’ve nearly filled a couple of the database devices… note that these aren’t
directly related to physical devices).

But, amidst all this stats-anguish, there is good news! Since many of you may
not have seen it on the stats main page today, here’s the relevant bit:

33,195,551 blocks were completed yesterday (0.048% of the keyspace)
at a sustained rate of 103,134,987 KKeys/sec!

Yes, you read that correct, and it’s not an error. We averaged 103Gk/s on
Monday. To help put this into perspective…

To put this in perspective, this rate is 217 times faster than our first day’s
rate of 475 *M*k/s. Even if you look to one month after the start of rc5-64,
when our rate had stabilized at about 8Gk/s, yesterday’s rate is still 13
times faster.

And, for those who are dying to know, at 103Gk/s it would take us 4.9 years
to polish off the remaining keyspace.

So, even though stats box isn’t too happy right now, our network is doing great!

1999-09-24

decibel [24-Sep-1999 @ 01:30]

Filed under: Uncategorized @ 01:30 +00:00

:: 24-Sep-1999 01:40 (Friday) ::

Well, there’s good news and there’s bad news. The good news is that
I think we found out what was causing the major slowdown on
stats-box (yesterday’s stats-run took about 20 hours to complete,
obviously not a good thing }:8( ).

Working off of a NuggetHunch, I’ve disabled the participant history
functions. It’s too soon to tell for sure, but this seems to have
brought the currently running update back up to normal speed. Of
course, this means that you can’t check your history right now.

The good news is that if this is the problem, we can probably
cure it by adding memory to the box (256 meg ECC DIMMS, for those
of you in a generous mood };8P ). The current theory is that the
history query is killing our caching, since it has to sift through
all ~20 million records in the main rc5 table.

We should be able to tell in a few hours if this actually solved
the problem or not. Thanks for your patience!

« Newer PostsOlder Posts »