»home  »about  »amiga  »amstrad cpc  »vic-20
...as dug by cool cats  contact | home « amiga « distributed.net

Amiga distributed.net RC5-64 cores!

bovine @ warp nine! distributed.net's "project bovine" was the main attempt at completing an rsa laboratories secret key challenge by brute-force decrypting a message encoded with the RC5 encryption algorithm and a 64 bit key. With 2^64 keys to test, it was important that the testing cores for each processor architecture supported by the project software were as fast as possible, and it was with this goal in mind that I set about creating the most efficient Motorola 680x0 RC5-64 tester that i could...

distributed.net

Checking all 2^64 keys would be much too big a job for any one computer to attempt, no matter how fast or powerful. 2^64 is two multiplied by itself 64 times, which works out at 18,446,744,073,709,551,616 or over eighteen quintillion. For comparison, this is roughly the age of the universe - in minutes.

distributed.net's aim was to complete this mammoth task by combining the power of many tens of thousands of smaller computers, with each one working on a separate, more manageable chunk of the overall task at the same time. A small client program was installed on each computer taking part in the effort and the internet was then used to coordinate and distribute work and results. Together, these computers formed a distributed network of machines cooperating to achieve the same goal.

On 14th July 2002 the correct decryption key was finally found by a "relatively characterless PIII-450 in Tokyo". The key was 0x63DE7DC154F4D039 and decoded the secret message "The unknown message is: some things are better left unread". Click here to read the full press release of the discovery (3Kb) .

The RC5 Algorithm

The RC5 algorithm was designed by Ron Rivest and is ©RSA. It is a block cipher with variable key and word lengths and a variable number of encryption rounds. This particular competition uses a variant of RC5 with a 32 bit word length, 12 encryption rounds and an 8 byte (64 bit) key, denoted as "RC5-32/12/8".

describes the algorithm in full. However, the distributed.net client software does not require a full implementation of this standard but merely a function that will encrypt a single block of data (64 bits in this case) using a given key and compare the encrypted output with a known value. The return code of the core function then indicates if the key was the correct one or not.

Applying the definition and restrictions of the RC5-64 contest to the reference implementation given in the RFC, the core function may be described as shown in Table 1 below.

Table 1: Base RC5-64 core algorithm

u32 A, B;
u32 i, j, k;
u32 P = 0xb7e15163;
u32 Q = 0x9e3779b9;
u32 S[26];

// initialise the expanded key table S[]
S[0] = P;
for (i = 1; i < 26 ;i++) {
    S[i] = S[i-1] + Q;
}

// mix in the secret key
A = 0; B = 0;
i = 0; j = 0;
for (k = 0; k < 26*3; k++) {
    A = ROTL(S[i] + A + B, 3);
    S[i] = A;
    B = ROTL(L[j] + A + B, A + B);
    L[j] = B;
    i = (i + 1) % 26;
    j = (j + 1) % 2;
}

// perform the encryption
A = P[0] + S[0];
B = P[1] + S[1];
for (i = 1; i <= 12; i++) {
    A = A ^ B;
    A = ROTL(A, B) + S[2*i];
    B = B ^ A;
    B = ROTL(B, A) + S[(2*i)+1];
}

// check the result
return ((A == C[0] && B == C[1]) ? 1 : 0);
 

Notes:

  • u32 defines an unsigned 32 bit integer type
  • P[], L[] and C[] are arrays of u32's with two elements
  • P[] is the plaintext to encrypt
  • L[] is the encryption key to encrypt P[] with
  • C[] is the ciphertext which the correct key will encrypt P[] to
  • ROTL(v,s) performs a 32 bit left circular shift of v by s places


 

Optimisations

The reference RC5 core function described above will work, but its performance can be drastically improved by making a few simple observations and applying the optimisations these imply. Examining the reference algorithm, we can make the following changes to improve performance:

  • The key expansion loop can be expanded into 3 separate loops, removing the expensive modulo-26 operations
  • L[] can be held as two u32 variables L0 and L1, removing the modulo-2 operations
  • Observing that the expanded key table initialisation sets each S[n] = P+Q*n, the initialisation loop can be combined with the first key expansion loop
  • The final key expansion loop and the encryption loop can be combined

Applying these optimisations results in an algorithm as shown in Table 2 below. There are also further (smaller) algorithmic optimisations to be made as well as any machine or CPU specific gains that may be possible, but these are left as an exercise for the reader! Hints: (i) B is not required; (ii) the loops can be unrolled; (iii) what does it mean if eA does not match C[0]?; (iv) the last element of S[] won't be needed for the last stage most of the time.

Table 2: Partially optimised RC5-64 core algorithm

u32 A, B, eA, eB;
u32 i, j;
u32 P = 0xb7e15163;
u32 Q = 0x9e3779b9;
u32 S[26];

// combined S[] initialisation and first key expansion loops
A = 0; B = 0;
j = P;
for (i = 0; i < 26; ) {
    A = ROTL(j + A + B, 3);
    S[i++] = A;
    B = ROTL(L0 + A + B, A + B);
    L0 = B;

    j += Q;

    A = ROTL(j + A + B, 3);
    S[i++] = A;
    B = ROTL(L1 + A + B, A + B);
    L1 = B;

    j += Q;
}

// second key expansion loop
for (i = 0; i < 26; ) {
    A = ROTL(S[i] + A + B, 3);
    S[i++] = A;
    B = ROTL(L0 + A + B, A + B);
    L0 = B;

    A = ROTL(S[i] + A + B, 3);
    S[i++] = A;
    B = ROTL(L1 + A + B, A + B);
    L1 = B;
}

// combined 3rd key expansion and encryption loops
eA = P[0] + S[0];
eB = P[1] + S[1];

for (i = 2; i < 26; ) {
    A = ROTL(S[i++] + A + B, 3);
    B = ROTL(L0 + A + B, A + B);
    L0 = B;

    eA = eA ^ eB;
    eA = ROTL(eA, eB) + A;

    A = ROTL(S[i++] + A + B, 3);
    B = ROTL(L1 + A + B, A + B);
    L1 = B;

    eB = eB ^ eA;
    eB = ROTL(eB, eA) + A;
}

// check the result
return ((eA == C[0] && eB == C[1]) ? 1 : 0);


 

680x0 Cores

Taking the general algorithmic optimisations described in the previous section and combining them with CPU specific optimisations, I developed several RC5-64 core functions for the Motorola MC68000 series of processors. These CPUs are found in all "Classic" Amiga computers as well as several types of older Apple Macintosh and UNIX workstation.

Benchmark Results

Using the test harness I have measured the performance of the various 680x0 RC5-64 cores I have developed on a number of different configurations of Amiga computer. The results obtained are shown in Table 3 below.

Table 3: 680x0 core benchmark results

reference original new
keyrate efficiency keyrate efficiency keyrate efficiency gain
cpu 7.14Mhz 68000 241 34 560 78 579 81 +3.4%
7.14Mhz 68010 257 36 580 81 608 85 +4.8%
14Mhz 68020 NF 658 47 2,551 182 4,406 314 +72.7%
14Mhz 68020 1,209 86 4,107 293 5,056 361 +23.1%
50Mhz 68030 NF 796 16 3,134 63 9,869 197 +214.9%
50Mhz 68030 4,490 90 14,847 297 19,195 384 +29.3%
25Mhz 68040 4,773 191 21,476 859 22,664 906 +5.5%
28Mhz 68040 8,320 297 24,545 876 25,902 925 +5.5%
40Mhz 68040 8,320 208 34,362 859 36,263 907 +5.5%
50Mhz 68060 26,481 530 118,4632,369 151,4843,030 +27.9%
66Mhz 68060 35,326 530 157,9452,369 201,9703,030 +27.9%

Notes:

  • the "reference" core is an optimised C implementation of the RC5-64 core compiled (with maximum speed optimisations) using v6.59 of the SAS/C compiler package for the Amiga
  • the "original" core is the the RC5-64 core implementation found in v2.7100.413 of the the distributed.net Amiga client program
  • the "new" core is one of the 680x0 RC5-64 core implementations developed by myself
  • the "new" core tested is the one that is best optimised for the CPU being used
  • "keyrate" is the number of RC5-64 keys that can be checked each second by the given core on the given CPU
  • "efficiency" is a measure of how efficient the core/CPU combination is at checking RC5-64 keys in terms of keys checked per second per CPU Mhz
  • "gain" is the percentage difference in keyrates between the core and the "original" core. This is the important figure!
  • keyrates are calculated by measuring the length of time it takes to check 100,000 keys with the benchmark program getting a guaranteed 100% of CPU time for the duration of the test
  • the overhead imposed by the benchmark harness is negligible
  • a cpu marked "NF" is using No Fastram, i.e.: only the very slow "chip" memory of the Amiga is being used


 

Benchmark Discussion

The majority of the keyrate gain demonstrated by the new 68020 and 68030 cores comes from reducing the size and rearranging the structure of the core code to make more efficient use of the tiny 256 byte level 1 cache found on these processors. This is clearly demonstrated by the results for the 68020 and 68030 machines using only the very slow "chip" memory of the Amiga (i.e.: basic machines with no expansion memory added) as the keyrate gain with the new core is much larger for these machines than for the otherwise identical machines using faster memory.

The 68000, 68010, 68040 and 68060 machines use practically the same new core, yet the 68060 shows over 20% more improvement over the original core than the others. Why? Instructions in the core are arranged to make maxiumum use of the 68060's superscalar execution features (two largely independent integer execution pipelines), but the other chips do not have this ability so therefore do not benefit from the optimisations. The gain they do experience is due to other optimisations in the core.

At just over 3,000K/s/Mhz the 68060 was one of the most efficient CPUs at RC5-64 key checking; for comparison, an Intel Pentium manages less than half this efficiency! The 68060 compares favourably to the more modern PowerPC, Athlon and Pentium III architectures in terms of efficiency, but of course these processors run at much higher clock rates than the 68060 and so their actual attained keyrate was proportionally much greater.

These new core implementations were integrated into the Amiga version of the RC5-64 client program as of v2.7100.418 BETA-2. Real life core performance in the client was not far off the rates under ideal conditions given in the table above.

Fun Statistics

  • 15,268,315,356,922,380,288 keys tested
  • 1,726 days elapsed
  • 102,385,059,633 keys tested per second
  • equivalent to 864,278 50Mhz 68060 machines using the original core
  • ...or 675,880 using the new core
  • to check these keys one 50Mhz 68060 machine would take 4,103,009 years
  • ...or only 3,206,325 years using the new core
  • 2,044,528,782,668,201,984 (13.391%) keys tested by all Amiga teams
  • 2,044,429,986,072,363,008 (13.390%) keys tested by the main Amiga team
  • 11,574,649,368,346,624 (0.076%) keys tested by AmigaOS machines
  • 3,738,792,385,052,672 (0.024%) keys tested by 680x0 processors
  • 811,382,672,982,016 (0.005%) keys tested by my machines (0.040% of team total)

Related Links

Related Downloads

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. My RC5-64 source code is free to download, modify and redistribute for non commercial purposes.
License: Creative Commons Attribution-Noncommercial-Share Alike 3.0

this is girv

... compu-fiddler from northern ireland, with a liking for sci-fi, cycling and old computers ...
 
» would you like to know more?

find me elsewhere

johngirvin.com
rain miles count double
wee doors banging
girv.in
 
» would you like to see more?

girv's links

Vi IMproved abime.net - amiga addicts sanctuary
amiga.org - the site for Amiga support and news Get Thunderbird - Reclaim your inbox!
This site uses the PNG Image Format Lemon Amiga - Games, Community & Nostalgia
» would you like to add more?
 
Valididate HTML  Validate CSS © copyright 1996-2010 john girvin, all rights reserved