SHA1 Attack Program

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

SHA1 Attack Program

Post by adresd »

Have been playing around with the SHA1 attack program and optimisation of the searching.

Currently using a hashlist of 2354 and dic size of 2322 (as an example).

Main changes are:
Removal of duplicates from dictionary on load.
Searchtable for the hash checking, cutting down searchspace time.
Moving SHA1 calc to minimise time spent doing it.

Thanks to gorim for helping with performance checking of the code.

Here is the result: attack.c [Updated]
------------------------------------------------------

Code: Select all

Edited - see below for newer version of code
------------------------------------------------------
This code also outputs all found function names into an xml file, this is for collecting them and easy collision checking later.


currently testing using following script:
------------------------------------------------------
#!/bin/bash
echo ------BUILDING--------
gcc attack.c -O3 -o attack
echo ------PROCESSING------
./attack sce hashlist.hsh dictionary.dic
------------------------------------------------------
which searches using prefix 'sce' ie kernel etc as words in dic file.

one final note: add a '0x00000000' and '0xffffffff' entries to your hash file. This solves a segfault some people reported.

tested on cygwin here.

Hope some can have some fun with this :))

Any comments/suggestions/inprovements welcome.

Next step for me is looking at optimising the dictionaries for each step.
Last edited by adresd on Tue Jun 14, 2005 12:44 am, edited 4 times in total.
Guest

Post by Guest »

Ok, here is a time comparison with the original.

Using a dictionary of 25 words, and a hash/nid list of 2355, here is the comparison:

original: 298.4 sec
optimized: 3.2 sec

basically, the original is 100 times slower than the one posted here.

It is worth noting, that nearly all of speedup came from optimizing checking the computed hash against the list of 2500 or so nids. That function alone consumed almost 90% of the execution time. Now its virtually non-existant in time.
Last edited by Guest on Wed Jun 01, 2005 5:05 am, edited 1 time in total.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

Great stuff, adresd and gorim. Also, I've been looking a bit into doing a BOINC server/client for this, and this seems like a good base to start from.
Last edited by mrbrown on Wed Jun 01, 2005 11:46 pm, edited 1 time in total.
adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

Post by adresd »

Thanks for the performance comparison gorim.

Its meant to give people a bit more of a good start, so we arent just wasting time.
Last edited by adresd on Wed Jun 01, 2005 9:55 pm, edited 1 time in total.
Guest

Post by Guest »

Is a distributed system really necessary any more ? ;) One fast Pentium 3.0ghz running for a day or much quicker should knock it all out. ;)
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

Really? Hmm, if that's the case then I'll not worry about it :P. Also it seems the SHA1 algo itself could use some platform-specific optimization, no?

I'd still like to improve this program for stuff like state saving/restoring, multithreading, and perhaps custom rules? It would be nice to have a list of verbs or smaller words to try as prefixes and then the rest of "normal" words.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

Also, what optimization flags are you guys using? Drakonite said he got the best results using '-O2 -fomit-frame-pointer', so I use '-O2 -fomit-frame-pointer -march=pentium4' at work. Is -O3 (which turns on aggresive function inlining) worse than -O2 here?
Guest

Post by Guest »

Well, I tested with -02 -fno-frame-pointer as well as -03.

At this state, there is virtually no difference brought about by optimization flags... at most there is 0.1-0.2 secs difference.

The best I could get from optimization was turning on the Mac PPC specific flags of -fast -mcpu=7450 which gave 0.5 sec improvement. Which isn't bad... .5 out of 3.2 is still 16% or so. It would add up on a large dictionary.

Drak probably saw an improvement on the original code because it was so slow that anything the compiler did for optimization would help. ;)
Guest

Post by Guest »

gorim wrote:Using a dictionary of 25 words, and a hash/nid list of 2355, here is the comparison:

original: 298.4 sec
optimized: 3.2 sec

basically, the original is 100 times slower than the one posted here.
I forgot to mention this test was ran on a 1.0 Ghz Apple Powerbook and the timings were quite consistant with each run.

Those of you who have 3.0Ghz x86 boxes should have no need of a distributed system. :)

Optimization flags may still matter. As I said, I could get a 16% improvement using them on top of the base optimized code, though its harder to expect more (and actually 16% is damned good) at this point because the code is already about as optmized as it can get. But one needs to play around much more than just -O2/-O3 which are now indistinguishable from each other.

As I implied before, I am thinking the next best improvement can come from some possible vectorization in SHA1Transform() [focused on the ROL() routine which is executed 3-4 times for each R()], which now consumes the highest execution profile in the code. The current implementation is about as optimized as one can get without resorting to vectorization, and even then it doesn't lend well to such.

Currently dictionaries and hash/nid lists should fit comfortably in L1 cache. For systems where they do not, some prefetching may be in order so as to keep the CPU busy.

At best, I reckon another 30%, if lucky, in overall program execution time. Compared to the amount of time it might take to write, test, debug, and profile such code, it may not be a worthwhile time investment.

Those of you who do have 3.0Ghz systems at your disposal, try things out and let us know. 100 times speedup may be plenty. ;)
MrSiir[S]
Posts: 32
Joined: Tue Sep 14, 2004 11:08 am

Post by MrSiir[S] »

I am proving the program with the dictionary of the original program of djhuevo and it always fails to me, I have compiled it in a Linux with GCC 3.4.1, here this the result:

Code: Select all

./attack sceHttp sceHttp.hsh sceHttp.dic 
SHA1 hash dictionary attack v1.0 by adresd
based on the original by djhuevo
Reading hash file 'sceHttp.hsh'
Reading dictionary file 'sceHttp.dic'

prefix          : 'sceHttp'
hash count      : 46
dictionary words: 145

Sorting Hashlist
Building Search Tree
Segmentation fault (core dumped)
P.D.: Sorry for my poor english :-)
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

There's currently a bug in fillsearchtable() as it's building the sorted hash table.

Add 0x00000000 to the top of your hash list, then add 0xffffffff to the bottom of your hash list to fix it.
User avatar
Drakonite
Site Admin
Posts: 990
Joined: Sat Jan 17, 2004 1:30 am
Contact:

Post by Drakonite »

gorim wrote:Is a distributed system really necessary any more ? ;) One fast Pentium 3.0ghz running for a day or much quicker should knock it all out. ;)
Er... I swear I figured out that the dictionary I had from mrb (for net stuff) would take over a hundred years with the old attack.c, so even if the new attack.c is over a hundred times faster, we'd still be looking at over a year to exaust that word space. Increase the dictionary size by making it a decent attack against all the possible function names and we are again at a huge amount of time and a distributed attack is the only real way to go IMO.
Shoot Pixels Not People!
Makeshift Development
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

If the SHA1 algorithm is CPU optimized and the search parallelized (for folks with mutiple-CPU boxes) I'm sure we can shave a lot more time off of that.
Guest

Post by Guest »

Drakonite wrote:Er... I swear I figured out that the dictionary I had from mrb (for net stuff) would take over a hundred years with the old attack.c, so even if the new attack.c is over a hundred times faster, we'd still be looking at over a year to exaust that word space.
I was under the impression that the new attack code could go through the dictionary in 12 hours or so on a 1Ghz box, based on adresd's statement he stopped his at 80% completion after 8-9 hours.

Does anyone else have any actual numbers on current runtimes with specific dictionary sizes on specific systems ?
Increase the dictionary size by making it a decent attack against all the possible function names and we are again at a huge amount of time and a distributed attack is the only real way to go IMO.
Well, the key thing is finding the smart way to do it in a distributed manner.
Guest

Post by Guest »

mrbrown wrote:If the SHA1 algorithm is CPU optimized and the search parallelized (for folks with mutiple-CPU boxes) I'm sure we can shave a lot more time off of that.
I found sound SHA1 code that looks a bit more optimized, including some hand-tuned assembly for PPC. I will be testing that out tonight.

It should be easy to code it so that multiple threads (1/cpu) can be fed strings to try...
zigzag
Posts: 129
Joined: Wed Jan 26, 2005 2:11 pm

Post by zigzag »

What is the goal here?
Guest

Post by Guest »

zigzag wrote:What is the goal here?
Very simple: within PSP modules, functions are identified by a 32-bit unique identifier, known as a NID.

These NID's are actually partial values of the SHA1 hash of the character string that contains the "name" of that function.

The function name, which is often descriptive enough to understand a fair amount of what the function does, is the unknown quantity.

The task here is: reverse map the known NID to the unknown function name by using brute force dictionary word sequences on possible name strings that are then transformed by SHA1 and comped to the list of currently known NIDs.

There is one minor drawback to this: there are many false positives generated as well. So many hits require a judgement call on its true likelihood.
steddy
Posts: 139
Joined: Mon Apr 04, 2005 3:53 am

Post by steddy »

Am I missing something.

I have a hash count of 13 and a dictionary of 151 words.

It takes above 5 seconds to process each word. This is going to take forever.

Compiled with cygwin and it did report some warnings:

Code: Select all

attack.c:48:65: warning: backslash and newline separated by space
attack.c:53:77: warning: backslash and newline separated by space
attack.c: In function `load_dictionary':
attack.c:248: warning: passing arg 1 of `strcmp' from incompatible pointer type
attack.c:252: warning: passing arg 1 of `strlen' from incompatible pointer type
attack.c:254: warning: passing arg 2 of `strcpy' from incompatible pointer type
attack.c:528:2: warning: no newline at end of file
Have I screwed up?

Steddy
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

Add -mno-cygwin to compilation maybe...
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
steddy
Posts: 139
Joined: Mon Apr 04, 2005 3:53 am

Post by steddy »

pixel wrote:Add -mno-cygwin to compilation maybe...
Thanks, but I get the same errors and same speed. How long should it take to check each word on a P4 3.2Ghz with 2Gb of memory?

I also got a false positive with such a small dataset. What is the statistical likelihood of a false find like this?

Cheers
Steddy
adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

Post by adresd »

False positives very much depend on your dictionary.
For that reason just throwing a BIG dictionary at it is a very bad idea.
Same with a badly picked one as well.

The run that i stopped at 80% was on a dictionary size of 394 words, running on a 1ghz machine.

I would say that about 400 words would take 12-13 hours on 1ghz, hence about 4-5 hours on 3.2ghz, 500 words is more than enough to make a big deal of inroads into the NID list.

The other point is, the word is displays as checking is a toplevel word, so it means checking everything starting with that word. Its more a progress marker than anything, so you can see how far down the dic it has reached.
steddy
Posts: 139
Joined: Mon Apr 04, 2005 3:53 am

Post by steddy »

gorim wrote:Using a dictionary of 25 words, and a hash/nid list of 2355, here is the comparison:

original: 298.4 sec
optimized: 3.2 sec
3.2 seconds for a dictionary of 25 words seems orders of magnitude faster than the results you or I are getting.

I think a big improvement would be for the code to concatenate any combination of upto 3 strings on the end of the prefix. Also, to try capitalizing the words in this order automatically:-

word
Word
WORD

Steddy
adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

Post by adresd »

thats because the default checks 4 words, or 5 if compiled for it.

Hence
25*25*25*25 = 390,625

versus

151*151*151*151 = 519,885,601

or for me
400*400*400*400 = 25,600,000,000

thats why its orders of magnitude slower, and why dictionary size is so critical :)
steddy
Posts: 139
Joined: Mon Apr 04, 2005 3:53 am

Post by steddy »

Ahhhhh.. that explains it.

Sorry for extending this thread unnecessarily with dumb statements in that case.

Steddy
adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

Post by adresd »

No problem steddy, I am sure those questions would have come up sooner or later :)


NOTE: for other people, I have updated the code in the first post, with my latest version.. some minor changes to searchtable and rejection handling, nothing critical, but it helps :)
adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

Post by adresd »

Simple easy way to view collisions in the results file.


Copy this:

Code: Select all

<?xml version="1.0"?>
<xsl&#58;stylesheet xmlns&#58;xsl="http&#58;//www.w3.org/TR/WD-xsl">
  <xsl&#58;template match="/">
    <TABLE STYLE="border&#58;0px solid black; width=100%">
      <TR STYLE="font-size&#58;12pt; font-family&#58;Verdana; font-weight&#58;bold; text-decoration&#58;underline">
        <TD STYLE="background-color&#58;lightgrey">Nid</TD>
        <TD>Function Name</TD>
      </TR>
      <xsl&#58;for-each select="NIDFUNCLIST/FUNC" order-by="NID">
        <TR STYLE="font-family&#58;Verdana; font-size&#58;12pt; padding&#58;5px 6px">
          <TD STYLE="background-color&#58;lightgrey"><xsl&#58;value-of select="NID"/></TD>
          <TD><xsl&#58;value-of select="NAME"/></TD>
        </TR>
      </xsl&#58;for-each>
    </TABLE>
  </xsl&#58;template>
</xsl&#58;stylesheet>
to a file called 'resultsdisplay.xsl'

Now add:

Code: Select all

<?xml version="1.0"?>
<?xml&#58;stylesheet type="text/xsl" href="resultsdisplay.xsl" ?>
to the top of the xml file, also make sure the closing element is intact if you interrupted the run, so its a valid xml.

Now open the xml file in webbrowser of your choice to view the nids found, sorted by nid.. making easy collision checking :)

Have fun
djhuevo
Posts: 47
Joined: Thu Mar 10, 2005 3:50 pm

Post by djhuevo »

gorim wrote:Ok, here is a time comparison with the original.
Using a dictionary of 25 words, and a hash/nid list of 2355, here is the comparison:

original: 298.4 sec
optimized: 3.2 sec
O___o

You are comparing with the first and buggy version released early in ps2reality.net forums.

That code has a horrible problem as stated in http://www.ps2reality.net/forum/viewtop ... c&start=15
and http://forums.ps2dev.org/viewtopic.php?t=1759 and fixed in
http://pspdev.ofcode.com as "Attack 0.1r2".

I have optimized Attack to be faster than actually, ripping off some parts of the SHA1 algorithm used. But it can be really faster if you tweak some things in asm, especially for little endian machines.

Code: Select all

hashes&#58; 47
dict words&#58; 86
combinations&#58; 4 words

Attack 0.1r2&#58; 100 seconds
adresd Attack&#58; 103 seconds
Attack 0.2&#58; 47 seconds

Code: Select all

hashes&#58; 47
dict words&#58; 37
combination&#58; 5 words

adresd Attack&#58; 129 seconds
Attack 0.2&#58; 58 seconds
Attack 0.2 can be found at http://pspdev.ofcode.com/index.php/attack-02
sobreviviendo en la tierra de los trolldev
adresd
Posts: 43
Joined: Sat Jan 17, 2004 11:32 am

Post by adresd »

Please note djheuvo that I was not the one who posted performance figures, I was not after the fastest, merely a lot faster than the one I had.

Congrats on getting it running faster.

Most benefits now with either be in SHA1 algo, or with dic selection imho.

Thanks for sharing your 0.2 version with the forums :)
Guest

Post by Guest »

The competition is on!!!! Those look like some good numbers to beat! :)
*rushes back to tuning SHA1*

(I finally figured out why the ASM code I found wasn't compiling. I hope to have a library of different SHA1 modules available to for ppl to swap in and out soon...)
djhuevo
Posts: 47
Joined: Thu Mar 10, 2005 3:50 pm

Post by djhuevo »

well, this is not a competition, this for ppl do the thing every time more fast.

talking about sharing: ppl here are talking 2000+ NID list..
anybody can be kindly to sendme it to publish in the API documentation proyect?? :P (i have only 1013)
sobreviviendo en la tierra de los trolldev
Post Reply