Masters Project - cryptanalysis on PS3

Investigation into how Linux on the PS3 might lead to homebrew development.

Moderators: cheriff, emoon

Post Reply
kroe
Posts: 6
Joined: Mon Apr 02, 2007 12:22 pm

Masters Project - cryptanalysis on PS3

Post by kroe »

I am looking for a topic for my CS masters project. Given the very impressive folding@home performance of the PS3, I am curious to see how well it would do at attacking cryptographic algorithms. It looks like a huge leap in performance vs typical desktop computers which have been the standard for evaluating the cost/feasability of cryptanalysis.

I don't have a PS3, and since I don't play games I am not planning to buy one (would love to get one just to play with and run linux on, but that's not in the cards right now for financial reasons).

Does anyone have a PS3 running linux they would be willing to give me a shell account on?

Thanks,
-Ken
rapso
Posts: 140
Joined: Mon Mar 28, 2005 6:35 am

Post by rapso »

the cell is powerfull in float computing, numbercrunching for cryptographic algorithms is integer math, I guess the PS3 won't be as impressive in this like in folding@home.
ldesnogu
Posts: 94
Joined: Sat Apr 17, 2004 10:37 pm

Post by ldesnogu »

Indeed the SPU is quite limited wrt integer math: you only get 16x16 -> 32 bit multiply, which is certainly killer for many cryptographic algorithms...

You should read available documentation about SPU instruction set before considering the PS3 :)
jimparis
Posts: 1145
Joined: Fri Jun 10, 2005 4:21 am
Location: Boston

Post by jimparis »

This paper talks about the performance of some CBE crypto stuff.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

rapso wrote:the cell is powerfull in float computing, numbercrunching for cryptographic algorithms is integer math, I guess the PS3 won't be as impressive in this like in folding@home.
I'd disagree to an extent. Floating point operations aren't actually as quick as integer anyway (6 cycles compared to 4 for floats), also doing any trig will require a library and so will be quite slow. Double-precision floats are a complete joke. You're limited to the very basics (add, subtract, multiply, divide), there isn't even a compare (you have to subtract, and check the MSB of the result) and every double-precision operation causes lots of pipeline stalls regardless of whether another instruction is waiting for the results or not.

The integer multiplication stuff is limiting (only 16x16->32 is supported and taking 7 cycles) although I've been doing the maths for fixed point 7.48 mutliplications and reckon I can get that down to about 40 cycles (although I haven't had time to actually implement it yet!) but the main advantage of integer is that it's very fast! With a well designed algorithm, you'll be fine.

Bear in mind that to some degree, the cycles taken by an instruction don't need to be limiting as you can just unroll your loop some more and calculate more results at once. You really should be reading Appendix B on the Cell BE Handbook for all this kind of information rather than listening to me though!

However, abck to cryptanalysis, you almost certainly won't need floats and doubtful that you'll need to be multiplying more than 16 bit integers anyway (probably the hardest is a multiplcation over GF(255) ). I'd suggest looking at e.g. parallel the DES crackers where they re-write the algorithm in terms of 1-bit registers and then everything can be written in terms of basic bitwise operations. They then use the register width to provide vectorised computation of 32 keys at a time. Obviously, this approach is limited on a 386 machine because there are so few registers that memory transfers happen a lot and because the registers are only 32 bit. On a cell with 127 128-bit registers, this approach could really fly. I'd almost be tempted to benchmark it myself, but I've lost interest in DES now anyway. You'll find some of SBE instructions are ideal for things like AES (shufb can be used to perform the shuffle stage in one instruction for instance) and the libssl guys have already ported AES to cell I beleive.

The only major thing I'd suggest is that you really do need to get your own PS3. Do you really want to risk something critical like your masters on whether you have access to someone else's box when they decide to play a game or switch the machine off? Even for someone who predominantly uses Linux on their PS3, you're not going to find many of them who won't mind the SPUs being hogged full time... After all, if their not actuall using the machine themselves why would they want it kicking out all that heat and using up all that electricity?
ldesnogu
Posts: 94
Joined: Sat Apr 17, 2004 10:37 pm

Post by ldesnogu »

ralferoo wrote:However, abck to cryptanalysis, you almost certainly won't need floats and doubtful that you'll need to be multiplying more than 16 bit integers anyway (probably the hardest is a multiplcation over GF(255) )
Number theoritic based cryptographic algorithms, such as RSA, need fast integer multiplications.

I find the numbers in the IBM "tech report" quoted by jimp strange, if not doubtful: RSA is heavily based on modular exponentiations of big numbers, which more or less is limited by integer multiplication.

Given that one SPU can only do 4 x 16x16->32 per cycle, while an Opteron can issue a 64x64->128 every other cycle (according to http://swox.com/doc/x86-timing.pdf), I would expect an Opteron to be more than twice faster at the same clock speed...

I need to check that doing the work myself :)
rapso
Posts: 140
Joined: Mon Mar 28, 2005 6:35 am

Post by rapso »

ralferoo wrote:
rapso wrote:the cell is powerfull in float computing, numbercrunching for cryptographic algorithms is integer math, I guess the PS3 won't be as impressive in this like in folding@home.
I'd disagree to an extent. Floating point operations aren't actually as quick as integer anyway (6 cycles compared to 4 for floats), also doing any trig will require a library and so will be quite slow. Double-precision floats are a complete joke.
we're talking about single-floats when we talk about the Cell power, in the other case the cell wont be faster than a core duo.
You're limited to the very basics (add, subtract, multiply, divide), there isn't even a compare (you have to subtract, and check the MSB of the result) and every double-precision operation causes lots of pipeline stalls regardless of whether another instruction is waiting for the results or not.
so we are glad that everybody is already aware of that and does use simple floats, right?
However, abck to cryptanalysis, you almost certainly won't need floats and doubtful that you'll need to be multiplying more than 16 bit integers anyway (probably the hardest is a multiplcation over GF(255) ). I'd suggest looking at e.g. parallel the DES crackers where they re-write the algorithm in terms of 1-bit registers and then everything can be written in terms of basic bitwise operations. They then use the register width to provide vectorised computation of 32 keys at a time. Obviously, this approach is limited on a 386 machine because there are so few registers that memory transfers happen a lot and because the registers are only 32 bit. On a cell with 127 128-bit registers,
you seem to have no clue about modern CPU architecture beside cell's. Modern x64 CPUs have twice the register count, you can use the general purpose registers, MMX-registers and SSE-registers, additionally those CPUs work out of order, to acomplish this, they've virtual registers and register renaming e.g. a P4 has 208registers at all.
although 8SPUs (or 6SPUs that u can get on ps3Linux) have a nice power, I doubht they could beat a modern quadcore cpu in integer performance.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

How about you try and be constructive?

You never once mentioned single-precision and claimed that cell is much faster at floating point than integer. This is basically wrong. There is _one_ case that FP is faster than integer, multiplication, and that's not usually needed for cryptoanalysis.

Who the hell cares about core duo performance? The OP certainly never mentioned it. Oh, and for your information, it appears to be you that's got no clue about modern processors - a x64 machine has 15 64-bit registers. Counting registers that you cannot directly access to work around problems in a CISC architecture do not count in my book as you cannot directly access them; as I said originally, you need to store anything else in memory. BTW, I explicitly mentioned 386 processors when talking about 32-bit.

I suggest you try looking in newsgroups for something like alt.advocacy.xbox360 as you seem intent in comparing everything to core duo processors instead of actually offering the OP some useful advice.
rapso
Posts: 140
Joined: Mon Mar 28, 2005 6:35 am

Post by rapso »

ralferoo wrote:How about you try and be constructive?
I just told my opinion, that the cell wont be as powerfull on integermath as on floatmath like folding@home, what's so damn wrong that ?
You never once mentioned single-precision and claimed that cell is much faster at floating point than integer.
I never said double either.
btw. everyone knows what float/double as datatypes mean, please get used to that.
Who the hell cares about core duo performance?
maybe the person who writes his masterproject and could compare the cell to other architectures before he starts working on it.
The OP certainly never mentioned it. Oh, and for your information, it appears to be you that's got no clue about modern processors - a x64 machine has 15 64-bit registers.
that's what I told you, twice the count of the 386 which had 8, additionaly 16SSE and 8MMX registers.
Counting registers that you cannot directly access to work around problems in a CISC architecture do not count in my book as you cannot directly access them;
another proof of your small knowledge about modern cpus. It's used in PowerPC, Alpha, Itanium etc. not a CISC issue. (that's one reason why the Cell-PPU has less generalpurpose performance than a PowerPC 1.25GHz )
I suggest you try looking in newsgroups for something like alt.advocacy.xbox360 as you seem intent in comparing everything to core duo processors instead of actually offering the OP some useful advice.
open your mind, Cell is not everything...
ldesnogu
Posts: 94
Joined: Sat Apr 17, 2004 10:37 pm

Post by ldesnogu »

Hey what about stopping giving numbers and try to code, that would be more fruitful. We could for instance try to beat IBM numbers :)
rapso
Posts: 140
Joined: Mon Mar 28, 2005 6:35 am

Post by rapso »

ldesnogu wrote:Hey what about stopping giving numbers and try to code, that would be more fruitful. We could for instance try to beat IBM numbers :)
yeah, we should beat the s**t out of IBM :)


btw. @kroe
IBM offers free shell accounts at http://www-128.ibm.com/developerworks/linux/openpower/
unfortunatelly just for PowerPC, former they had AMD (and I think Sparcs as well), but it's a long time since I tried it, anyway, you could try them to check out the PowerPC performance if nobody gives you a ps3-shell-account, they are the most cell-alike CPUs you can get instead.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

blah ... open your mind, Cell is not everything...
When have I ever said it was? The OP asked a simple question about programming under Cell and you replied with core duo this and core duo that.

Over the years, I've programmed in assembler and C on many different architectures including PICs, Z80, 6502, 680x0, 32016, MIPS, 8088, 80186, 80386, x64 and now Cell. I really don't have an axe to grind. I have my favourites, sure, but I develop on whatever I get paid to develop on at the time.

That said, I got a PS3 with my own money specifically to look at the Cell processor because it sounded cool and something I'd like to know more about. The more I look at it, the more I realise what a clean design it is and how well implemented and open it is. The design philosophy seems to have been "how can we implement X really fast" where X is one of MPEG decoding, ciphers, vector transformations, etc,.. IBM have then taken a massive step back from the problem and thought how best to make it generic so that Y and Z can all be implemented efficiently as well. Good examples are the shufb instruction, addx, sumb, etc. all of which are designed around making a very specific task very fast but are also generic enough to be useful for all sorts of purposes.

I'm not bashing Intel chips here nor have I even said anything negative about them (although I could easily jibe that the only good features in modern Intel processors are the ones copied from AMD); I'm merely pointing out features on the processor the original OP asked about that would help in the area the original OP asked about. It's you that seems to have the "this processor is better than that" fanboy mentality.
rapso
Posts: 140
Joined: Mon Mar 28, 2005 6:35 am

Post by rapso »

ralferoo wrote:
blah ... open your mind, Cell is not everything...
When have I ever said it was?
I compared cell and core duo and u said:
comparing everything to core duo processors
The OP asked a simple question about programming under Cell and you replied with core duo this and core duo that.
I just said it's not as powerfull with integer as with float. then you started to with ur 386-20years-ago-blah, so I told you how the present technology is. then you started your CISC-I-dont-care-about-blah.

rapso wrote:I guess the PS3 won't be as impressive in this like in folding@home.
ralferoo wrote:I'd disagree...
ralferoo wrote:fanboy mentality.
laichung
Posts: 123
Joined: Fri May 06, 2005 2:02 pm

Post by laichung »

Folding@ team already explained clearly the pros and cons of CELL with Folding@Home.

In my opinion, you can only take advantage if your code is best fit with the SPU and vector programming. Otherwise CELL is only a G5 PowerPC processor. So I think directly compare CELL with x86 based CPU is meaningless.
kroe
Posts: 6
Joined: Mon Apr 02, 2007 12:22 pm

Post by kroe »

Wow... I was hoping to get a reply or two... you guys are great, and obviously have an understanding of CPU architecture and microcode that is way beyond mine.

This is a masters project not a PH D thesis, breaking major new ground is not a requirement. I just thought that given the success of the Cell at folding @ home and IBM's published cryptographic performance (showing 1-2x the performance of a typical x86 cpu per SPU) it would be interesting to see how it would do at cryptanalysis compared to x86 and x64 systems.

I want to focus on running some comparison tests and cost/performance comparisons vs commodity PC hardware (which is generally what is used to estimate the cost/feasability of cryptanalysis). While the Cell isn't going to be a true disruptive technology that blows cryptographic algorithms out of the water, I think it does have the potential to significantly alter the cost/feasability of breaking algorithms that were already on the edge.

Since I am focusing mostly on the comparison I was thinking that rather than try to write my own super-optimized code with no Cell experience I would be better off using the crypto libraries provided by IBM. I would implement some sort of attack (not even sure what yet... depends on choice of algorithm to attack) using their libraries wherever possible. I'd love to write my own from scratch and have it perform better than IBMs, but think that I would have little chance of doing so without a huge time investment which would distract me from the overall goal of comparing the PS3/Cell to commodity PC hardware for cryptanalysis.

Have any of you had a chance to explore the crypto libraries that appear to be bundled with the IBM Cell SDK?
Post Reply