distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.
Well, after nearly 12 days we’ve managed to get the old keymaster hardware back online. The problem ended up being some failing RAM, though the symptoms weren’t immediately obvious as that. Although we are currently still operating on the old hardware, we are planning to transition to newer hardware tomorrow. The newer hardware was shipped last week during the outage and is still in the process of being racked and powered up.
We do not expect tomorrow’s transition off the old hardware to be externally visible since our proxy network will have caught up and buffered enough work for the brief outage. Stats will have a multi-day gap during the time we were offline, but we expect that the statistics totals for 2015-07-22 will reflect a significantly higher-than-normal amount of work due to all of the backlogged data that had been buffered.
Additional measures are being made to ensure that an outage of this duration does not happen again.
Thanks again for your patience!
We are experiencing another hardware issue with our central keymaster, and we do not yet have an estimate for it to be brought back online. However, our proxy network will continue to buffer completed work until the keymaster can be brought back online. Stats will also be delayed until that occurs. More information will be posted as it becomes available. Thanks for your patience!
We experienced some hardware issues with our keymaster server today, possibly due to a mechanical issue. It had a few hours of downtime on 2015-05-16 causing work to become backlogged and buffered on our keyserver network, which means that the stats for that day will be significantly lower than usual, however tomorrow’s stats should include all of that backlogged work. Newly submitted work is continuing to be processed without delay at this point.
Although the keymaster is back online for the moment, we are continuing to do remote hardware diagnostics. It sounds likely that we may need to replace a system case fan however. Thanks for your patience and support!
We’re currently in the process of doing some backend database, webserver, and OS upgrades to our stats servers, so we’ve temporarily disabled access to it until we’re done. Once we’ve verified that everything appears to be working, we’ll brings the stats pages back online. No project data will be lost during this maintenance, and we’ll restart stats processing once it’s done. Thanks for your patience.
Update: This maintenance has now been completed and everything is back to normal.
two rackmount servers, one rackmount UPS, and two dozen donuts being transported to a new home
standard and deviation, before the move
As previously announced, we had to move our stats servers to a new hosting location due to changes at our old hosting provider. Logistically it was quickest and easiest for us to simply physically move the two servers to their new location in Houston, so we completed that move today. Zone file entries have been updated, so http://stats.distributed.net/ should now take you to the new location once any cached DNS entries have been propagated.
The downtime was a little longer than planned because of an unexpected routing failure that began at our old location earlier than our planned start of the move. We also waited until stats has fully processed all of its backlog before officially publicizing its new IP address at its new location.
Part of the transport occurred by private plane, and although the outbound flight was tracked on FlightAware, the return flight was unfortunately not publicly visible due to an air traffic control limitation.
No dnetc work submitted during the downtime was lost, and our stats pages should now fully reflect the last day of work.
Our stats server is currently offline as we prepare it for relocation. We expect it to be back online tomorrow around 2014-05-11 at about 22:00 UTC but it is possible that we may need to delay its restart time.
During the downtime we will be physically relocating our servers 160 miles, from Austin, TX to a new hosting location in Houston, TX offered by FlightAware. Part of the transport will be conducted by private plane, which might be visible on the FlightAware flight tracking website around 2014-05-11 at 17:00 UTC.
We’d like to thank Midas Green Tech in Austin, TX for the use of their hosting facilities over the years, but their business is transitioning away from physical hosting and towards virtual hosting so our move was an unfortunate necessity. If you are in need of any cloud hosted services, we encourage you to check out their offerings and mention that you heard about them from distributed.net
As always, all dnetc work transmitted during the downtime will be fully counted once our stats website is brought back online.
We’re very close being able to declare OGR-27 finished, but we have just a few more stubs remaining before final completion. The previous stubspaces 27.1, 27.2, and 27.3 are now officially complete. We expect the last results in 27.4 to be received some time in the next two weeks, and then hope to begin sending out the first OGR-28 stubs within a few hours after that.
As a reminder, no new dnetc client binaries or configuration will be necessary since all existing OGR-27 (OGR-NG) clients are already capable of working on OGR-28 once we begin sending out those workunits.
Look forward to us making another blog post containing some final statistics about OGR-27 within the next two weeks!
(edit: a previous version of this post had some overly optimistic dates)
If you have tried accessing www.distributed.net recently, you might have noticed that it was offline for nearly a full week. We unfortunately suffered from multiple hard disk failures just a couple of days apart on the RAID10 array of the machine that serves as our primary webserver, DNS server, and mail server. Ultimately, the disk array was unrecoverable and it will need to be recreated with new hard drives that we’ve just purchased. Fortunately, we have pretty recent backups of everything important so we’ve begun bringing the website and blogs back online using alternate hosting.
There are still a few minor things offline (bugzilla, email fetch/flush, email support, speeds database, project graphs) but we hope to have those all back online soon. We’re also taking this opportunity to improve our server infrastructure to minimize problems in the future. Of course during the website outage, the keymaster, keyserver network, and stats server continued to function and process blocks without interruption, so we expect that most of our participants were not noticeably impacted. Keep on crunching!
As you might remember, one year ago we participated in World IPv6 day by permanently turning on IPv6 access for our main website (www.distributed.net). Today, we made the next step towards embracing IPv6 by adding AAAA records to our proxy keyserver DNS addresses and by releasing our first set of IPv6-capable dnetc clients on our pre-release download page, as version v2.9111.520. There should be no additional client configuration changes needed for the client to utilize IPv6 once your OS has been properly configured.
At the moment, we only have one of our keyservers currently configured to handle IPv6 connections and our pre-release page only has IPv6 clients for Solaris/SunOS, however we will be adding more over the next few days. Updated personal proxies that support IPv6 will also be made available in the coming weeks. We welcome bug reports on bugs.distributed.net if you encounter any connectivity problems with our new releases.
Although Earth Day was a couple of weeks ago, we’ve been diligently “recycling” RC5-72 blocks (ie: re-distributing previously issued but unreturned work assignments) since last November. We realize that distributing these old blocks has been inconvenient for some of our users because the fragments are much smaller, causing more network and slightly lower keyrates, and we have appreciated everyone’s patience and understanding. Fortunately, we are very close to resuming work on fresh RC5-72 subspaces and will be able to send out larger, unfragmented blocks within the next day or so!
Keep on crunching!