staff blogs

distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.

2006-12-04

bovine [04-Dec-2006 @ 23:49]

Filed under: stats @ 23:49 +00:00

:: 04-Dec-2006 23:49 GMT (Monday) ::

Our stats server, Fritz, is currently offline due to its ongoing RAID
issues. Although the machine is actually back online right now, we
have the webpages turned off until we finish making some more tweaks.

For the technically interested, the problem appears to one of the
following problems:

1) Four of the WDC hard drives (SATA model WD2000JB) we have are
suspected to possibly be affected by a timeout issue related to
thermal calibration, or a lack of TLER (Time Limited Error Recovery).

Western Digital claims the problem only affects certain older ATA
drives (but ours are SATA) http://lnk.nu/wdc.custhelp.com/c6c.php
And 3Ware confirms that the ATA version of our model number (but
not necessarily SATA). http://lnk.nu/3ware.com/c6d.aspx

There is a drive firmware update, but only available for ATA
drives. We have already opened support tickets 3Ware and WDC more
than a week ago and are still waiting for responses.

2) Physical drive failure. We’ve already had all of the drives RMA’ed
at least once when we first started having these problems, so we
don’t believe there is a physical failure in the normal sense. The
drives report no errors after a reboot.

3) Motherboard compatibility with our RAID controller. We have a Tyan
S2882 motherboard, but 3Ware’s compatibility page for the
9550SX-8LP says only Tyan S2880 and S2885 are “officially”
supported. http://lnk.nu/3ware.com/c6e.pdf We don’t think this is
too probable of a cause though.

4) FreeBSD updates. We’re currently on FreeBSD 6.0 stable, but 6.1
stable has some additional 3Ware driver updates, so tonight we will
be upgrading to that. http://lnk.nu/freebsd.org/c6f.html

5) 3Ware RAID firmware updates. We’ve already updated to the latest
firmware a couple weeks ago prior to this most recent outage, so
the firmware alone is not a fix.

6) 3Ware RAID controller. Several months ago we tried replacing the
RAID controller with a slightly different 3Ware model to see if
that would affect things, but the problem persisted.

We’ve also just recently purchased a KVM-over-IP solution to allow us
to remotely manage the machine if it becomes inaccessible over the
network. Unfortunately, this most recent failure wedged the OS
preventing even a keyboard-initiated reboot from working.

If we don’t get any further responses from WDC or 3Ware, our next
possible option is to go out and buy 4 new 200GB+ SATA drives from
another manufacturer and see if that improves things.

We might also try moving some of the drives (containing the OS and
swap) to the onboard RAID controller and see if that can avoid
preventing the OS from going down when the data volume goes down.

Thanks for your patience!

2006-12-02

bovine [02-Dec-2006 @ 21:04]

Filed under: keyservers @ 21:04 +00:00

:: 02-Dec-2006 21:04 GMT (Saturday) ::

Our fullserver in Australia, proxy1.bris.qld.au.proxy.distributed.net,
has changed IP addresses and the server that was running at the old
address will be shut down in a few days. If you have not hard-coded
IP addresses into your config files, then you should be fine and
unaffected by this address change.

Also worth noting: earlier this week on Thursday, our keymaster server
was relocated to a new physical location. This planned move took only
a couple hours and was completed successfully without impacting
operations, due to the fully buffered nature of our proxy network.
The only effect was a brief gap in our keyrate graphing during the
time, and a surge once the keymaster was restarted.
http://stats.distributed.net/keyrate.php?project_id=25

2006-11-26

bovine [26-Nov-2006 @ 21:18]

Filed under: clients @ 21:18 +00:00

:: 26-Nov-2006 21:18 GMT (Sunday) ::

Several new client versions have been moved from the pre-release page
to the official release page. This new release features a new
optimized OGR core that offers improved performance on AMD processors:

*Windows 32bit [x86/Zipped] v2.9013.498
*Windows 32bit [x86/Installer] v2.9013.498
*Linux [x86/ELF] v2.9013.498
*PC-DOS, MS-DOS [x86] v2.9013.498
*NetBSD [MIPSEL/ELF] v2.9013.498
*FreeBSD [4.x/x86/ELF] v2.9013.498
*FreeBSD [5.x/x86/ELF] v2.9013.498
*FreeBSD [6.x/x86/ELF] v2.9013.498
*Solaris/SunOS [x86] v2.9013.498

Download links for all supported platforms can be found at
http://www1.distributed.net/download/clients.php

2006-11-14

chrisj [14-Nov-2006 @ 11:39]

Filed under: stats @ 11:39 +00:00

:: 14-Nov-2006 11:39 GMT (Tuesday) ::

Stats are back up again.

Apologies for the extended down-time, folks.

2006-11-05

chrisj [05-Nov-2006 @ 17:07]

Filed under: stats @ 17:07 +00:00

:: 05-Nov-2006 17:07 GMT (Sunday) ::

As you’ve no doubt all noticed, stats has gone down. Again.

It’s looking like fritz is having some more drive troubles. We’re working as
fast as we can to get the box back online and stable again.

As usual, all work is being logged, and will be credited when the site is back
online again.

Apologies for the extended down-time.

2006-10-30

chrisj [30-Oct-2006 @ 16:08]

Filed under: stats @ 16:08 +00:00

:: 30-Oct-2006 16:08 GMT (Monday) ::

At the risk of sounding like a broken record, stats are down again. More
information when we know more.

2006-09-30

chrisj [30-Sep-2006 @ 12:05]

Filed under: stats @ 12:05 +00:00

:: 30-Sep-2006 12:05 GMT (Saturday) ::

Fritz is back up again. We’re in the process of catching the database up now.

Thanks for your patience.

2006-09-28

chrisj [28-Sep-2006 @ 15:44]

Filed under: stats @ 15:44 +00:00

:: 28-Sep-2006 15:44 GMT (Thursday) ::

It looks like statsbox has suffered another drive failure. Unfortunately, it’s
looking like it will be a few days before anyone can go out and diagnose it.

We apologise for the downtime again. All work will be credited as soon as
statsbox is back online again.

2006-09-25

chrisj [25-Sep-2006 @ 11:52]

Filed under: stats @ 11:52 +00:00

:: 25-Sep-2006 11:52 GMT (Monday) ::

Statsbox appears to be down again. More information when we know more.

2006-09-06

nugget [06-Sep-2006 @ 15:52]

Filed under: stats @ 15:52 +00:00

:: 06-Sep-2006 15:52 GMT (Wednesday) ::

Statsbox recovery is proceeding as expected. We’ve got the hardware all sorted
out and now we’re just doing one last quick backup of the postgresql database
prior to starting the raid10 rebuild (currently degraded).

Thanks again for your patience while we replace yet another of these old SATA
drives.

« Newer PostsOlder Posts »