staff blogs

distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.

2006-09-06

decibel [06-Sep-2006 @ 12:39]

Filed under: stats @ 12:39 +00:00

:: 06-Sep-2006 12:39 GMT (Wednesday) ::

We’ll be doing some maintenance on fritz this afternoon, so stats will be
offline for a few hours. Sorry for any inconvenience.

2006-05-28

bovine [28-May-2006 @ 02:27]

Filed under: Uncategorized @ 02:27 +00:00

:: 28-May-2006 02:27 GMT (Sunday) ::

Mailing lists operational again. We’ve been experiencing some
problems with our mailing lists server over the past few weeks, but
hopefully it should be operational now!

2006-04-18

bovine [18-Apr-2006 @ 06:05]

Filed under: clients @ 06:05 +00:00

:: 18-Apr-2006 06:05 GMT (Tuesday) ::

Several new v2.9011 and v2.9012 clients have been moved from to the
official release page. This includes clients for these platforms:

– Solaris/SunOS [x86] v2.9012.497
– OpenBSD [AMD64/ELF] v2.9011.496
– OpenBSD [x86/ELF] v2.9011.496
– FreeBSD [AMD64/ELF] v2.9011.496
– FreeBSD [ELF/x86] v2.9012.497
– Linux [x86/ELF] v2.9012.497
– OS/2 [x86] v2.9012.497
– Mac OS X/Darwin [x86] v2.9012.497
– Mac OS X/Darwin [PPC/OS X] v2.9012.497
– PC-DOS, MS-DOS [x86] v2.9012.497
– Windows 32bit [x86/Zipped] v2.9012.497
– Windows 32bit [x86/Installer] v2.9012.497

Links are on http://www1.distributed.net/download/clients.php and
http://www.distributed.net/download/updates.php summarizes the list of
platforms that have been updated.

2006-04-12

decibel [12-Apr-2006 @ 15:51]

Filed under: stats @ 15:51 +00:00

:: 12-Apr-2006 15:51 GMT (Wednesday) ::

As nugget mentioned yesterday, we think we’ve discovered the reason why drives
keep dropping out of the array. Nugget tried to fix the problem, but it looks
like he was unsuccessful as we’re back to degraded mode again.

Rather than continue without stats while we try and fix this, we’re going to
turn them back on and switch to nightly backups for now. It is possible we
could end up losing some user changes if we lose another drive in the array,
but hopefully that won’t happen…

2006-04-11

nugget [11-Apr-2006 @ 16:36]

Filed under: stats @ 16:36 +00:00

:: 11-Apr-2006 16:36 GMT (Tuesday) ::

I made good progress on statsbox today and I think we’ve finally found the
fundamental problem that keeps taking drive 8 offline. Each of the 9
drive bays in fritz’s case has a little hotswap backplane board which
connects to the drive’s SATA and power connectors on the front, and to
the case power supply and SATA cable on the back side. It looks like
cable tension for the bundle of cables for the last three bays has been
pulling down on those three cables and loosening the connection between
the SATA cable and the backplane board. The cables for all three bays
are really, really loose and bay 7 even has broken plastic.

Here’s the guts of the machine, if you want to see what I mean:
http://slacker.com/photos/computers/inside

And here’s a closeup of the last three bay connectors (this is logically
“upside-down” from the first picture, looking at the back of the left-most
drive bays, under the optical drive):
http://slacker.com/photos/computers/fritzbackplane

Since we’re only using 8 of the 9 bays, I shuffled the drives around to
avoid the worst connector, and I also re-routed the cables so that they’d
be pulled up instead of from below, to best compensate for the looseness.
I’m talking with the vendor to see about replacing the dodgy backplane
boards. Since each bay has its own board, I’m optimistic that we’ll be
able to buy just three of them for cheap and hook them up.

I also hooked up our new 3Ware battery backup unit to the 9550SX RAID
controller. This thing had been on backorder for months, and they’re
finally hitting the marketplace. The battery has to test for 24 hours,
but after that we’ll be able to finally turn on write-caching, which
should really speed things up.

I still need to swap out the two failed drive fans in the front of the
case, too, but I no longer think they’re a factor in the crashing.

The RAID10 volume is rebuilding and so far no drives have dropped offline:

Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC
———————————————————————
u0 RAID-1 OK – – 279.387 OFF OFF OFF
u1 RAID-10 REBUILDING 68 64K 558.762 OFF OFF OFF

Port Status Unit Size Blocks
—————————————————–
p0 OK u0 279.46 GB 586072368
p1 OK u0 279.46 GB 586072368
p2 OK u1 232.88 GB 488397168
p3 OK u1 186.31 GB 390721968
p4 OK u1 186.31 GB 390721968
p5 OK u1 232.88 GB 488397168
p6 OK u1 186.31 GB 390721968
p7 DEGRADED u1 186.31 GB 390721968

Name OnlineState BBUReady Status Volt Temp Hours LastCapTest
—————————————————————
bbu On No Testing OK OK 0 xx-xxx-xxxx

Thanks again for your patience during the significant downtime we’ve
had recently. I’m really hopeful that we’ve figured it out and will
be able to stabilize things really soon.

2006-03-28

decibel [28-Mar-2006 @ 02:33]

Filed under: stats @ 02:33 +00:00

:: 28-Mar-2006 02:33 GMT (Tuesday) ::

Another of the original drives in fritz has died. Fortunately there was no data
corruption like last time, but we’ve decided to keep stats offline until we can
get a new replacement installed. I’m not sure when exactly that will happen,
since I’m currently 8 time-zones away from the machine. I would expect it to be
this week, however.

2006-03-27

nugget [27-Mar-2006 @ 11:54]

Filed under: stats @ 11:54 +00:00

:: 27-Mar-2006 11:54 GMT (Monday) ::

I’ve got statsbox back up and online, and the raid10 volume is
currently rebuilding. It looks like the drive tray fans on
drives 7 and 8 have stopped working, which may be the source
of the problem. All those SATA drives are crammed in close
together and perhaps the drive weirdness we’ve seen lately is
the result of heat issues from the failing fans.

I’ve got the stats website shut off for now while the volume
rebuilds and Decibel can get a chance to nose around and make
sure that all the data looks sane.

EDIT: crap, drive 7 just disconnected itself again.

2006-03-25

decibel [25-Mar-2006 @ 06:44]

Filed under: stats @ 06:44 +00:00

:: 25-Mar-2006 06:44 GMT (Saturday) ::

Stats are currently down, and I’m unable to ssh into the box. Since I’m in
Belgium right now, there’s not much I can do, but someone in the states should
be up and able to look at it in the next few hours.

2006-03-24

decibel [24-Mar-2006 @ 09:31]

Filed under: stats @ 09:31 +00:00

:: 24-Mar-2006 09:31 GMT (Friday) ::

I’ll be updating PostgreSQL on stats shortly; there will be a brief outage.

2006-03-09

bovine [09-Mar-2006 @ 23:47]

Filed under: keyservers @ 23:47 +00:00

:: 09-Mar-2006 23:47 GMT (Thursday) ::

We have had to take the proxy1.madsn.wi.us.proxy.distributed.net proxy
offline and out of the round-robin DNS for awhile. If you have
hard-coded its name or IP address into your configuration files, then
you will want to make an update to your INI files.

« Newer PostsOlder Posts »