staff blogs

distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.

2000-05-30

decibel [30-May-2000 @ 04:41]

Filed under: Uncategorized @ 04:41 +00:00

:: 30-May-2000 04:45 (Tuesday) ::

Some probably noticed that stats were down for a bit. Real observant folks
might even have noticed that statsrun isn’t running right now. Basically,
the backup is interfering with the statsrun. Normally, this backup is run
Sunday night after statsrun, so it doesn’t interfere with anything, but
it’s still running right now (I don’t know if it started late or is just
running really, really slow).

I had web access shut off for a while to try and speed things along, but
I’m not sure it was making a difference so it’s back on again. Statsrun
won’t be happening until the backup is done though, I’m afraid. We’ll
get it going ASAP.

2000-05-27

dbaker [27-May-2000 @ 08:17]

Filed under: Uncategorized @ 08:17 +00:00

:: 27-May-2000 08:25 (Saturday) ::

Moo.

It’s been a long while since I’ve made a planman updates, so I wanted to
make a post and attempt to get back in the habit of communicating
effectively.

I have very little new to report. For the first half of 2000, my consulting
business and non-computer-work aspects of my life have occupied most of
my time. Accordingly, little time has been left for significant
distributed.net projects. Regardless, I’ve been maintaining existing
services and continuing to handle day-to-day issues.

Yesterday evening marked the completion of a long outstanding project and
one of the first steps in replacing the ancient nodezero server. I
completed setting up a second web server and changed “www.distributed.net”
to be a round-robin between two of our servers. There are still a couple
minor kinks that we’re working out, but it will be flawless in no time.

I plan to finish up the new nodezero next week and make significant progress
towards replacing the old one shortly. I also need to catch up on the
keymaster log archiving. My work is cut out nicely for me. _]:8)

I’ll keep busy. Expect more news shortly.

-dbaker

2000-05-26

decibel [26-May-2000 @ 02:34]

Filed under: Uncategorized @ 02:34 +00:00

:: 26-May-2000 02:38 (Friday) ::

Well, there’s good news and bad news. }:8)

Statsbox once again seems to be stable running on one CPU. The bad thing
is that it’s pretty damn slow with only one CPU… statsrun is taking 6-8
hours.

The other bad news is that the CPU with the screwy heat sink is apparently
fried… the box won’t even POST with it in.

So, stats will probably be a bit slow for awhile until we can take care
of the CPU situation. But they should be UP at least. }:8)

Thanks to everyone for their understanding.

Moo!

2000-05-25

decibel [25-May-2000 @ 06:06]

Filed under: Uncategorized @ 06:06 +00:00

:: 25-May-2000 06:08 (Thursday) ::

Well, I don’t want to get anyone’s hopes up, but we **MAY** have found
the problem with statsbox. Seems one of the CPUs had a loose fan. We’ve
got that CPU out of the box right now, and I’m doing a statsrun. We’ll
see how well the box holds up. Don’t hold your breath though, there’s
still plenty of other things that could be wrong.

davehart [25-May-2000 @ 01:27]

Filed under: Uncategorized @ 01:27 +00:00

:: 25-May-2000 01:31 (Thursday) ::

Wednesday 24 May between 18:30 and 23:30 UTC mail sent to
help@distributed.net was bounced back to the sender by mx.hartbrothers.com
reporting it could not be delivered to an @hartbrothers.com address used
internally by our help@ software, Mustang Message Center. If you received
a reply with a long tracking number in the subject, your mail was delivered
and will be read by help@ staff. If you received a nastygram from
mx.hartbrothers.com, please resend your mail.

My apologies for the inconvenience.

Dave Hart

2000-05-24

nugget [24-May-2000 @ 20:38]

Filed under: Uncategorized @ 20:38 +00:00

:: 24-May-2000 20:43 (Wednesday) ::

Another quick update, and a reiteration…

We’re still not entirely sure which piece of hardware is causing the
problems on statsbox. Today we’re experimenting with the cpus and simms
to try to determine if one of them is faulty. Onboard scsi adapter on
the asus p2b-ds is also a suspect at this point.

Also, just to clarify: the keyserver network is _completely_ unaffected
by statsbox being down. We’re still processing keys and we’re still able
to detect the winning key if a client finds it. When statsbox is brought
back online, we’ll have no trouble “catching up” with the logs that are
being generated by the keymaster right now.

More details as we know more. Again, sorry for the delays. I’m pretty
certain that this misadventure is going to drive us into buying a name-brand
server to house the data with a support contract. It’s just too awkward
trying to diagnose hardware problems when we’ve only got one person (a
very busy person) who is local to the machine.

2000-05-23

decibel [23-May-2000 @ 06:48]

Filed under: Uncategorized @ 06:48 +00:00

:: 23-May-2000 06:51 (Tuesday) ::

Me and my big mouth. No sooner did I finish that last
.plan update than I noticed that my ssh sessions to the
box weren’t responding. We’re working on it right now, but I’m guessing
that the box is going to need a good kick in the backside, which won’t
happen until tomorrow.

*sigh*

More info as available….

decibel [23-May-2000 @ 06:40]

Filed under: Uncategorized @ 06:40 +00:00

:: 23-May-2000 06:45 (Tuesday) ::

[01:39:30] .plan update, .plan update!!

Very well, since #distributed asked for it… }:8)

We’ve dropped the FSB speed on statsbox (no, it was *not* overclocked
before) and have also turned off swapping on the questionable drive. After
that, we got a backup and a half (the backup to Birmingham was cut short
due to a router failure, but we think we got most of it) of what’s on the
box right now. So, even if the box blows up, we should be ok for the most
part.

I just started a statsrun… the webserver is off right now but we’ll turn
it on in a bit. If we’re lucky, these two changes have solved the problem,
and we’ll just have to figure out which one did the trick (we’ve got a
little pool going… my money is on the swap partition }:8) ).

Anyway, with luck we’ll have stats back again in a few hours.

Sorry for the inconvenience.

nugget [23-May-2000 @ 03:03]

Filed under: Uncategorized @ 03:03 +00:00

:: 23-May-2000 03:07 (Tuesday) ::

First off, thanks for all your patience with statsbox’s instability the
past few days. It looks like we’re losing the 9gb drive that’s in the
machine. getting scsi bus errors, timeouts, and system panics. (panics
are, we think, because that drive housed some swap space in addition to
database data).

We’re in the process of doing a backup of all the data to a few off-site
machines and then we’re going to do some more detailed diagnostics to
determine the source of the problem. The symptoms are still a bit sketchy
at this point and we don’t have a solid theory. Data integrity first,
then we’ll worry about replacing the faulty hardware.

It’s unlikely that we’ll bring stats up tonight, most likely it’ll be
tomorrow at the earliest.

Look for further updates from myself, decibel, or peter as we hone in on
the problem.

]:8)

decibel [23-May-2000 @ 01:55]

Filed under: Uncategorized @ 01:55 +00:00

:: 23-May-2000 02:01 (Tuesday) ::

Statsbox update….

The box has definitely degraded… it’s been down more than it’s been up
today. At this point, our priority is just to get a solid backup before
we’re totally dead in the water.

Based on the few useful error messages from /var/log/messages, this seems
to be a virtual memory issue, and we can also tell that one of the drives
is failing, so a safe bet would be that the swap partition on that drive
is giving us fits. A donated RAID controller should be on it’s way, which
would solve this *IF* it was in fact the drive that was to blame, but
that’s a pretty big *IF*. There’s also been discussion of flat-out
purchasing a box from a major manufacturer, with on-site tech support,
etc. But it would take some time before such a box could be put into
production, so hopefully we can stabilize statsbox in the meantime.

More info as available…

« Newer PostsOlder Posts »