IBM XL C compiler slower than gcc C compiler?

Investigation into how Linux on the PS3 might lead to homebrew development.

Moderators: cheriff, emoon

Post Reply
kroe
Posts: 6
Joined: Mon Apr 02, 2007 12:22 pm

IBM XL C compiler slower than gcc C compiler?

Post by kroe »

I am a bit baffled by what I am seeing switching between the IBM XL C compiler and the GCC C compiler.

The code I am running is heavily optimized using the SPU C intrinsics. Almost all operations are done through intrinsics. I know this doesn't leave the compiler much to do, but the difference I am seeing makes no sense to me.

With the GCC compiler I am getting roughly triple the performance out of the six SPUs as I can get running on the two cores of my Athlon 64 x2 3800+; it takes 47 seconds on the Cell with the app compiled through gcc. I read that the IBM XL compiler produces considerably faster binaries, so I tried it.... 118 seconds for the exact same code on the exact same target PS3.

Since the code was so laden with intrinsics I figured that it would be about the same with both compilers, with the IBM compiler having the advantage since they designed the processor, wrote the intrinsics, and wrote the compiler.

Any ideas why I am not seeing the expected results?

I am compiling through Eclipse using SDK 2.1 on my Athlon 64 running Fedora Core 6 and am running on my PS3 using SDK 2.1 running Fedora Core 6.

Thanks,
-Ken
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

I may be wrong, but final speed of compiled code depends how well is understood the "time dependancies".

Sometimes you start an instruction and the result can't be obtained immediately. So you can start another instruction that will involve parallel unused parts of processor, instead of just waiting with a nop (or having the processor waiting without any compilation warning).

The result is an horrible code, very hard to understand for human brain, because most instructions are all sorted in different order, compared to the source, but it runs very very fast.

I guess two teams writing a compiler won't have the same level of understanding of these optimization technics...

Also, using direct assembly code in C may be a way to disable these automatic optimizations...
ldesnogu
Posts: 94
Joined: Sat Apr 17, 2004 10:37 pm

Post by ldesnogu »

ps2devman is right: using intrinsics the compiler can schedule code.
You should post your code on IBM Cell forum, the xlc compiler team will surely do its best to beat gcc ;-)
Laurent
demosuzki
Posts: 1
Joined: Sat Aug 25, 2007 10:39 pm
Location: dublin , ireland

Post by demosuzki »

take a look at the spu_timing utility to examine the generated code.

off the top of my head the process is
in the link stage you add a -s to the linker options and then run the command spu_timing on the output. <modulename>.s files.
(I'll check this...I can't right now)

the output is an ascii file with the assembler and a nice graph of the cycle counts coupled with pipeline in which the dependency stalls are shown.
its then possible to rearrage (manually) your intrincics to see if you can remove the stalls and make more efficent pipeline use.

my expirence with some code i have optimised is that the xlc was faster (30%) than gcc. but i guess it all depends on context.

/ds
ldesnogu
Posts: 94
Joined: Sat Apr 17, 2004 10:37 pm

Post by ldesnogu »

demosuzki wrote:off the top of my head the process is
in the link stage you add a -s to the linker options and then run the command spu_timing on the output. <modulename>.s files.
No, passing -s to the linker instructs it to remove symbols (strip).

If you want to get assembly file, you replace -c with -S (capital).
For instance, gcc -S foo.c will create foo.s.
Laurent
vi_vid
Posts: 4
Joined: Wed May 30, 2007 8:53 pm
Location: Russia
Contact:

Post by vi_vid »

XLC has 6 (or 7???) optimization levels, GCC has 4.

to dump commented asm code in GCC, i usually use
--save-temps --verbose-asm
cellrb.blogspot.com|cellperformance.com
Post Reply