:: 17-Jan-1999 12:18 (Sunday) ::
Re-wrote S-box 3 and modified slightly S-Boxes 1, 2, 6 and 8 for
DES MMX.
Should give a 4% speedup.
distributed.net staff keep (relatively) up-to-date logs of their activities in .plan files. These were traditionally available via finger, but we've put them on the web for easier consumption.
:: 17-Jan-1999 12:18 (Sunday) ::
Re-wrote S-box 3 and modified slightly S-Boxes 1, 2, 6 and 8 for
DES MMX.
Should give a 4% speedup.
:: 11-Jan-1999 13:58 (Monday) ::
DES MMX version 2 is a goer. The code now passes the
test keys.
Not as fast as I had hoped but a definite improvement.
90 clocks per key on a PII.
:: 14-Dec-1998 00:44 (Monday) ::
The new DES MMX bitslice driver code has been started.
I converted the existing GAS s-boxes to NASM macros and modified
them to the new interface.
:: 29-Nov-1998 22:52 (Sunday) ::
The RC5 keyrate plots at http://www.distributed.net/statistics/ and
http://www.distributed.net/statistics/rc5-64/ now have regression
lines plotted.
The regressions are for both linear and exponential growth using
daily block counts from 2 March 1998 to the last stats run and
exclude DES-II contests and various outliers.
:: 27-Nov-1998 01:55 (Friday) ::
Did some investigation of integer bit-slice last night and it looks
like it is a no win over BrydDES. Based on clock counts from
s-boxes 1 and 4 it would be 5% faster if hand assembly can do 10%
better than an optimized compile under DJGPP.
For integer work it looks like it is best to concentrate on BrydDES
for now. First step is to figure out how it works.
On the MMX front, here is how I come up with a possible 50% improvement.
On a P5 MMX, the s-boxes average 45 clocks per box. There are 8 boxes
per round and an equivalent of 9.348 (of 16) rounds done for each
“slice” of 64 keys.
To those 45 clocks must be added some setup (basically a = e ^ k)
and cleanup which amount to 16 clocks if they can’t be paired. As
these are all loads and stores they are all U pipe and will not pair
unless the s-boxes are rewritten to accommodate it.
So we have a clock count per key of (45 + 16) * 8 * 9.348 / 64 = 71.28.
At 200MHz this gives 2806 kkeys/s. We currently get 1876 kkeys/s.
2806/1876 = 1.50. Q.E.D.
If the s-boxes can be re-written it might go as high as 60%, assuming
half of those 16 instructions pair.
This all relates to the P5 MMX as the clock counts are deterministic.
On the out of order processors (PPro, PII, K6 and K6-2) the
improvement may vary.
:: 17-Oct-1998 12:01 (Saturday) ::
From 12 Oct 1998 to 8 Nov 1998:
At home looking after “Edward David Charles Ford”, his mother and
his brother. Edward was born at 15:03 on 11 Oct 1998 local time
(05:03 UTC) weighing 3.81 kg (8lb 6.5oz).
It is unlikely that much distributed.net work will get done over
this period.
:: 09-Oct-1998 20:42 (Friday) ::
Work in progress:
1. Adapting RC5 MMX core to K6-2 (possible 30% speed up)
2. Modifying RC5 MMX core to remove first and last rounds (2% speed up)
3. Integrating new Alpha RC5 core into the client (50% speed up)
4. Rewrite of DES bitslice driver code
5. Rewrite DES MMX driver code (possible 50% speed up)
6. Rewrite DES MMX S-Boxes to new driver interface
7. Automating “Time to completion” and creating web page
8. Placing regression lines (linear and exponential growth) on plots
9. Investigate FPU or MMX use for RC5 on K6
10. Investigate feasibility of x86 integer bitslice DES core
11. Make BrydDES core thread safe and investigate using more bits
:: 28-Sep-1998 07:14 (Monday) ::
Cores and their integration, x86 and Alpha
Thats all I do folks.