Problem compiling ps2ftpd with latest IOP compiler

Discuss the development of software, tools, libraries and anything else that helps make ps2dev happen.

Moderators: cheriff, Herben

Post Reply
dlanor
Posts: 258
Joined: Thu Oct 28, 2004 6:28 pm
Location: Stockholm, Sweden

Problem compiling ps2ftpd with latest IOP compiler

Post by dlanor »

As part of the LaunchELF project we have experimented with adding a built-in ps2ftpd.irx. This can be activated by programming a launch key to use the pseudo file MISC/PS2NET, and then using that key as a command. This causes the launching of various networking modules (if not already launched), including ps2ftpd. This all works fine with some versions of the ps2ftpd irx, but there is a problem in compiling that module properly.

If we compile it using v2.8.1 of the IOP compiler, then it works fine.
If we compile it using v3.2.2 of the IOP compiler, then transfers to memory card will malfunction, crashing somewhere in an MCMAN exception (debugging precisely where/how it happens is hard).

The file Rules.make of the ps2ftpd project indicates that an attempt has been made to adapt it to the new compiler, but this has not been fully successful (as evident from the above).

However, if I edit Rules.make so as to remove the option "-O2" from IOP_CFLAGS, then the irx compiled with v3.2.2 does work correctly. This indicates that the cause of the problem is some of the new optimization defaults in the new version of the IOP compiler.

But turning off all optimization to get around this is too high a price to pay, when it causes (in this case) a binary to grow from 25KB to 41KB. That's an increase of 64% which I find unacceptable.

So here's my real question:
Is there any way to find out exactly which optimization causes this problem ?
And if so, is there some way to disable that optimization only, while still keeping the "-O2" option ?

Best regards: dlanor
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

Optimisation differences between 2.8.1 and 3.2.2 are quite heavy, and are subject to cause troubles with some category of bugs, in particular with the volatile keyword. If you have several threads (and I guess it's the case), that are accessing the same variable, that variable should be declared as volatile. That's about the only big problem I can see with code-working-with-2.8.1-and-not-with-3.2.2-anymore.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
dlanor
Posts: 258
Joined: Thu Oct 28, 2004 6:28 pm
Location: Stockholm, Sweden

Post by dlanor »

pixel wrote:Optimisation differences between 2.8.1 and 3.2.2 are quite heavy, and are subject to cause troubles with some category of bugs, in particular with the volatile keyword. If you have several threads (and I guess it's the case), that are accessing the same variable, that variable should be declared as volatile. That's about the only big problem I can see with code-working-with-2.8.1-and-not-with-3.2.2-anymore.
With a single client accessing ps2ftpd there should only be one active thread for the server (maybe 2 if it spawns when invoked, but the 2nd one won't be active, just listening), and I fail to see why this should cause a crash when trying to transfer stuff to memory card, though it causes no problem when transferring stuff to HDD partitions...

So no, I don't think it's so simple as a misdeclared variable... :(

But if that should be the problem, is there no way to get back the 'old' default behaviour by specifying some compilation flag, like we can do to avoid some other optimizations ??? (eg: "-fno-builtin" etc.)

Also, remember that if I remove "-O2" then it does work fine, though at cost of 64% larger binary. Surely there must be some way to block only the relevant optimization. Is there any list of individually blockable optimizations, so I can try them one by one, to find the culprit ?

Best regards: dlanor
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

First, try -O1. Then, read a bit the ee-gcc manual, and look for the list of the individual optimisation flags, which is a very large bunch of -f flags. I'd still recommand trying finding the error though.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
dlanor
Posts: 258
Joined: Thu Oct 28, 2004 6:28 pm
Location: Stockholm, Sweden

Post by dlanor »

pixel wrote:First, try -O1.
Ok, I will, but even if it works that is no solution, as it doesn't reveal the reason for the problem. I will do it only in the hope of later finding a list of which otimizations differ between -O1 and -O2, as that will help me eliminate some possible causes.
Then, read a bit the ee-gcc manual, and look for the list of the individual optimisation flags, which is a very large bunch of -f flags.
I realize it will be a huge job, but this sort of thing is worth the effort of testing them one by one. After all, it is not just this one project that is affected. People are complaining all over of odd behaviour of various PS2SDK modules when compiled with v3.2.2.

I'm sure most (all?) of those problems can be fixed by properly adapting the CFLAGS of those project files, so as to force the new compiler to behave more like the old one in some crucial aspects. But first we need to know exactly what flags are needed.
I'd still recommand trying finding the error though.
I'll try that too, but I strongly doubt that there is any distinct error. At least not one I can identify without having a full list of the optimization differences between v3.2.2 and v2.8.1, since one or more of those differences is what triggers the bug.

I'll try to dig into the GCC documents and sources to see what I can find in the way of flag lists, and then start testing my way through that multitude of flags...

Best regards: dlanor
pixel
Posts: 791
Joined: Fri Jan 30, 2004 11:43 pm

Post by pixel »

dlanor wrote:I'm sure most (all?) of those problems can be fixed by properly adapting the CFLAGS of those project files, so as to force the new compiler to behave more like the old one in some crucial aspects. But first we need to know exactly what flags are needed.
I'd still recommand trying finding the error though.
I'll try that too, but I strongly doubt that there is any distinct error. At least not one I can identify without having a full list of the optimization differences between v3.2.2 and v2.8.1, since one or more of those differences is what triggers the bug.
Having a CFLAGS workaround to a bug in the software isn't such a good idea. Finding the exact bug shouldn't be that much difficult, but it might come from very obscure problems around the ps2ftpd code. Again, finding the exact bug in the code is way better than a workaround for that problem.
pixel: A mischievous magical spirit associated with screen displays. The computer industry has frequently borrowed from mythology. Witness the sprites in computer graphics, the demons in artificial intelligence and the trolls in the marketing department.
User avatar
Drakonite
Site Admin
Posts: 990
Joined: Sat Jan 17, 2004 1:30 am
Contact:

Post by Drakonite »

Everything is being compiled with -g 0 right? I seem to recall this problem showing up at the time of the switch to gcc3.x but maybe I'm wrong
Shoot Pixels Not People!
Makeshift Development
dlanor
Posts: 258
Joined: Thu Oct 28, 2004 6:28 pm
Location: Stockholm, Sweden

Post by dlanor »

Drakonite wrote:Everything is being compiled with -g 0 right?
If you mean "-G0", then yes. The following line is straight from Rules.make of ps2ftpd in CVS:

Code: Select all

IOP_CFLAGS  := $(CFLAGS_TARGET) -O2 -G0 -c $(IOP_INCS) $(IOP_CFLAGS)
Removing "-O2" from that line eliminates the bug, but at unacceptable cost of bloating the code by an extra 64%. That's why I insist on searching for a way to block just the optimization(s) responsible for the problem.

Edit: the line quoted above is also found in Makefile.íopglobal of the ps2sdk release package, so it's bound to be used by lots of projects.
I seem to recall this problem showing up at the time of the switch to gcc3.x but maybe I'm wrong
You're probably right, though it may be impossible to now pinpoint exactly what version of the compiler was the first to change the critical optimization. (Especially before we've identified which optimization it is...)

I still haven't had time to dig into the GCC docs and make a thorough step-by-step test of the possible variations, but that still remains my plan.

Best regards: dlanor
BraveDog
Posts: 29
Joined: Thu Dec 30, 2004 1:16 am
Location: Cleveland

Post by BraveDog »

You can browse through the newer released versions of gcc and look for Optimization bugs that were fixed. Example:
http://gcc.gnu.org/gcc-3.3/changes.html

Here is a bug that has 'incorrect code for inlining of memcpy under -O2'
http://gcc.gnu.org/PR8634

EDIT
Also found one that is MIPS-specific:
http://gcc.gnu.org/PR9496

I'm not saying that is the problem, just things to look into.
dlanor
Posts: 258
Joined: Thu Oct 28, 2004 6:28 pm
Location: Stockholm, Sweden

Post by dlanor »

BraveDog wrote:You can browse through the newer released versions of gcc and look for Optimization bugs that were fixed. Example:
http://gcc.gnu.org/gcc-3.3/changes.html

Here is a bug that has 'incorrect code for inlining of memcpy under -O2'
http://gcc.gnu.org/PR8634

EDIT
Also found one that is MIPS-specific:
http://gcc.gnu.org/PR9496

I'm not saying that is the problem, just things to look into.
Thanks a lot BraveDog, this is just the sort of help I needed.

That memcpy stuff in particular looks promising, as the text clearly states that this bug is present in all versions from 3.2 through 3.3, and it is something we must block in any case, even if it is not the cause of the particular bug I'm investigating.

Edit:
I have confirmed that the bug occurs identically if "-O2" is replaced by "-O1" (I should have mentioned this earlier, but forgot)
I have also tested that -fno-inline has no effect whatever on the bug.

I'm now again at a loss for how to proceed, as the official GCC manual does not contain any list of which optimizations are turned on by "-O1" or "-O2". It only contains some very general statements about the kind of reasoning behind the inclusion of various types of optimization, without actually identifying any real cases. (like: "without performing any optimizations that take a great deal of compilation time" and similiar nonsense...)

I suppose I'll have to go to the source code of the compiler itself...
Those lists have to exist somewhere, and I WILL find them, no matter where I have to search.

Best regards: dlanor
MrHTFord
Posts: 35
Joined: Tue Feb 10, 2004 2:04 am
Location: England

Post by MrHTFord »

Hi Dlanor,

Welcome to the toolchain hacking club.

http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options

Should be very applicable to 3.2.2.

Enjoy. If you isolate the function that gets miscompiled, you can get GCC to output RTL (register transfer language) at each stage of the optimization process and then find out from that what exactly goes wrong. Then the hardcore fun of tracking down why it goes wrong begins!

Enjoy your stay.

Edit: Look for "-dletters" on this page to see how to get RTL outputs from GCC:

http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options
dlanor
Posts: 258
Joined: Thu Oct 28, 2004 6:28 pm
Location: Stockholm, Sweden

Post by dlanor »

MrHTFord wrote:Hi Dlanor,

Welcome to the toolchain hacking club.
Thanks. I haven't actually hacked much of it yet of course, but I hope to contribute something on these IOP issues.
http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options

Should be very applicable to 3.2.2.
Indeed. That is exactly the kind of list needed. But silly me made the mistake of downloading the same docs for v3.2.3 instead, thinking they were closer to what we use, and in that version of the docs there are no such lists. (The flags are only named there, but not grouped by how they are affected by -O1 or -O2.) So thanks a lot for pointing me to this version instead.
Enjoy. If you isolate the function that gets miscompiled, you can get GCC to output RTL (register transfer language) at each stage of the optimization process and then find out from that what exactly goes wrong. Then the hardcore fun of tracking down why it goes wrong begins!
Well the absolute 'why' may be too elusive. I'll be happy if I can just find a reliable way eliminate the bug. Anything more is a bonus.
Enjoy your stay.

Edit: Look for "-dletters" on this page to see how to get RTL outputs from GCC:

http://gcc.gnu.org/onlinedocs/gcc-3.3.1 ... %20Options
Thanks, I'll try that.

Best regards: dlanor
Post Reply