staff blogs

distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.

2006-01-13

decibel [13-Jan-2006 @ 11:58]

Filed under: stats @ 11:58 +00:00

:: 13-Jan-2006 11:58 GMT (Friday) ::

Ok, firefox is out-foxing me I guess…

If I enter http://midasnetworks.com/ into the URL field, it pulls the website
up just fine. But clicking on that link is actually broken. Wee.

Anyway, working url is http://www.midasnetworks.com/

decibel [13-Jan-2006 @ 11:39]

Filed under: stats @ 11:39 +00:00

:: 13-Jan-2006 11:39 GMT (Friday) ::

Fritz has been successfully moved to it’s new home. Since we wanted to minimize
downtime, we used a fast transport protocol (http://lnk.nu/decibel.org/7m1.jpg).
Luckily, the MTU on this protocol was plenty large to allow transporting fritz
without the need to fragment it (http://lnk.nu/decibel.org/7m2.jpg). It’s
doubtful that other transport protocols could have handled this
(http://lnk.nu/slacker.com/7m3).

In any case, thanks again to http://midasnetworks.com for providing fritz with
a home in Austin!

2006-01-12

decibel [12-Jan-2006 @ 15:32]

Filed under: stats @ 15:32 +00:00

:: 12-Jan-2006 15:32 GMT (Thursday) ::

http://stats.distributed.net will be moving to http://midasnetworks.com in
approximately 2 hours. If everything goes well, downtime should only be 45-60
minutes.

bovine [12-Jan-2006 @ 04:37]

Filed under: stats @ 04:37 +00:00

:: 12-Jan-2006 04:37 GMT (Thursday) ::

There will be a planned stats outage starting around Jan 12 23:00 UTC,
as we relocate the server to a new hosting facility. We hope to bring
stats back online within a couple of hours. There will be a followup
plan announcement once service has been restored.

2005-12-29

nugget [29-Dec-2005 @ 17:03]

Filed under: stats @ 17:03 +00:00

:: 29-Dec-2005 17:03 GMT (Thursday) ::

You’ve probably noticed that statsbox is offline yet again. We don’t know
what’s happened to it. Unfortunately, the two people who have access to the
data center are both unavailable due to the holidays and we’re not sure when
we’ll be able to get in there to take a look.

In related news, we’re planning to relocate statsbox to a new (better) location
sometime in January.

2005-12-08

decibel [08-Dec-2005 @ 01:43]

Filed under: stats @ 01:43 +00:00

:: 08-Dec-2005 01:43 GMT (Thursday) ::

Stats are back online again. Sorry for all the recent downtime.

2005-12-07

decibel [07-Dec-2005 @ 15:07]

Filed under: stats @ 15:07 +00:00

:: 07-Dec-2005 15:07 GMT (Wednesday) ::

Fritz is looking happy again. I’m running a vacuum of the entire database to
make sure PostgreSQL is happy as well. Once that’s done I’ll turn stats back
on.

2005-12-06

decibel [06-Dec-2005 @ 19:05]

Filed under: stats @ 19:05 +00:00

:: 06-Dec-2005 19:05 GMT (Tuesday) ::

*sigh*

Got a background fsck failure on /usr which I wasn’t able to handle remotely.
My attempt ended up rendering the box off the net, so we’re now stuck until
someone can get to the console, which might well be tomorrow. Ooops.

Sorry for the continued delay…

decibel [06-Dec-2005 @ 13:46]

Filed under: stats @ 13:46 +00:00

:: 06-Dec-2005 13:46 GMT (Tuesday) ::

Replacement drives are finally here. We’re working on getting a backup before
doing the RAID rebuild, which is why stats are down. They should hopefully be
back up in time for statsrun.

2005-11-30

decibel [30-Nov-2005 @ 00:21]

Filed under: stats @ 00:21 +00:00

:: 30-Nov-2005 00:21 GMT (Wednesday) ::

Well… when it rains…

Nov 30 05:39:02 fritz kernel: twa0: INFO: (0x04: 0x000b): Rebuild started: unit=1
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x0026): Drive ECC error reported: port=5, unit=1
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x002d): Source drive error occurred: unit=1, port=5
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x0004): Rebuild failed: unit=1
Nov 30 05:48:01 fritz kernel: twa0: ERROR: (0x04: 0x0002): Degraded unit: unit=1, port=3
Nov 30 05:51:47 fritz kernel: twa0: INFO: (0x04: 0x000b): Rebuild started: unit=1

In plain english… another drive has failed. I’ve heard it’s common for drives
from the same manufacturing run to all fail at the same time; I guess this is
proof.

I’m going to turn stats back on again, but I highly recommend you not make any
changes to team or participant information until this is all cleared up. It is
very possible that we will end up losing the entire array again, which right
now would mean reverting to a backup that could be days (or possibly even
weeks, depending on how long this takes).

We’ve already RMA’d 2 200G drives. Once those come back it shouldn’t be much of
an issue for us to deal with drive failures, since we’ll have some spares
on-hand. I’m also going to setup replication of critical data so that even if
we do lose the database again loss of user-modified data should be minimal.

Thanks for your patience.

« Newer PostsOlder Posts »