TI mode in gcc broken.

Post by **pixel** » Fri Apr 15, 2005 12:19 am

Okay, this is reported by shazz.

After some investigations, I conclude the TI mode support using our current gcc patches is broken, when it comes to casting, in both optimized and non optimized mode. Here is the basic example:

Original C code:

Code: Select all

#include <tamtypes.h>

u32 foo&#40;&#41;;

u128 bar&#40;&#41; &#123;
    return foo&#40;&#41;;
&#125;

u128 foobar&#40;&#41; &#123;
    return 5;
&#125;

Disassembly of the above compilation using -O2:

Code: Select all

00000000 <bar>&#58;
   0&#58;   27bdfff0        addiu   sp,sp,-16
   4&#58;   ffbf0000        sd      ra,0&#40;sp&#41;
   8&#58;   0c000000        jal     0 <bar>
                        8&#58; R_MIPS_26    foo
   c&#58;   00000000        nop
  10&#58;   dfbf0000        ld      ra,0&#40;sp&#41;
  14&#58;   0000102d        move    v0,zero
  18&#58;   03e00008        jr      ra
  1c&#58;   27bd0010        addiu   sp,sp,16

00000020 <foobar>&#58;
  20&#58;   700014a9        por     v0,zero,zero
  24&#58;   03e00008        jr      ra
  28&#58;   24020005        li      v0,5
  2c&#58;   00000000        nop

Same, without any -O:

Code: Select all

00000000 <bar>&#58;
   0&#58;   27bdffe0        addiu   sp,sp,-32
   4&#58;   ffbf0010        sd      ra,16&#40;sp&#41;
   8&#58;   ffbe0000        sd      s8,0&#40;sp&#41;
   c&#58;   0c000000        jal     0 <bar>
                        c&#58; R_MIPS_26    foo
  10&#58;   03a0f02d        move    s8,sp
  14&#58;   0002103c        dsll32  v0,v0,0x0
  18&#58;   0002183e        dsrl32  v1,v0,0x0
  1c&#58;   0060102d        move    v0,v1
  20&#58;   0000102d        move    v0,zero
  24&#58;   03c0e82d        move    sp,s8
  28&#58;   dfbf0010        ld      ra,16&#40;sp&#41;
  2c&#58;   dfbe0000        ld      s8,0&#40;sp&#41;
  30&#58;   03e00008        jr      ra
  34&#58;   27bd0020        addiu   sp,sp,32

00000038 <foobar>&#58;
  38&#58;   27bdfff0        addiu   sp,sp,-16
  3c&#58;   ffbe0000        sd      s8,0&#40;sp&#41;
  40&#58;   03a0f02d        move    s8,sp
  44&#58;   700014a9        por     v0,zero,zero
  48&#58;   24020005        li      v0,5
  4c&#58;   03c0e82d        move    sp,s8
  50&#58;   dfbe0000        ld      s8,0&#40;sp&#41;
  54&#58;   03e00008        jr      ra
  58&#58;   27bd0010        addiu   sp,sp,16

In both cases, the "foobar" function returns the correct value in $v0, and the "bar" function returns 0, whatever the result of "foo" is. Worse: I believe the way it does it is wrong, since it uses a 64-bits move to erase $v0, potentially letting garbage in the upper 64 bits of $v0.

Did anyone experienced such things in the past ? Can anyone of you check this with the previous toolchain ? I can't do such test atm.

petrs · Post by **petrs** » Sat Dec 16, 2006 9:20 am

Quite a long time passed from your discovery, did anything change since that time?

ragnarok2040 · Post by **ragnarok2040** » Sun Oct 04, 2009 4:29 am

Sorry about dragging up such an old topic, but I've been testing this every now and then trying to fix it in gcc, but no luck yet. I did find kind of a workaround for this case. If you use a 16-byte union like this:

Code: Select all

typedef union &#123;
   u8  byte&#91;16&#93;;
   u16 hword&#91;8&#93;;
   u32 word&#91;4&#93;;
   u64 dword&#91;2&#93;;
   u128 qword;
 &#125; UQWORD;

Then you can assign the return value of foo directly to word[0] like this:

Code: Select all

u128 bar&#40;&#41; &#123;
	UQWORD ret;
	ret.word&#91;0&#93; = foo&#40;&#41;;
   return ret.qword;
&#125;

and bar will return foo()'s value without too much overhead...

Code: Select all

00000008 <bar>&#58;
   8&#58;	27bdffe0 	addiu	sp,sp,-32
   c&#58;	ffbf0010 	sd	ra,16&#40;sp&#41;
  10&#58;	0c000000 	jal	0 <foo>
  14&#58;	00000000 	nop
  18&#58;	afa20000 	sw	v0,0&#40;sp&#41;
  1c&#58;	dfbf0010 	ld	ra,16&#40;sp&#41;
  20&#58;	7ba20000 	lq	v0,0&#40;sp&#41;
  24&#58;	03e00008 	jr	ra
  28&#58;	27bd0020 	addiu	sp,sp,32
  2c&#58;	00000000 	nop

dlanor · Post by **dlanor** » Sun Oct 04, 2009 9:04 am

ragnarok2040 wrote:Sorry about dragging up such an old topic, but I've been testing this every now and then trying to fix it in gcc, but no luck yet. I did find kind of a workaround for this case. If you use a 16-byte union like this:

There are various methods to make a function return what we want, not least the possibility of forcing the result by hand-optimized asm (the only truly certain way). But this does not really address the true problem here, which is that the compiler is inherently bugged in how it translates basic typecasting needs into real assembly code.

There are other examples of a similar nature that do not involve any function return values, but merely normal variable to variable assignments where explicit typecasting written by a C coder has no effect at all. Like typecasting a u64 to u128, and finding that the compiled code completely ignores the upper 64 bits, instead of properly zeroing them in the transfer to a 128 bit unsigned variable, which thus retains its original upper bits after the transfer.

I think that one root of this problem is that the compiler itself is not properly 'aware' of whether or not various EE-specific instructions affect 64 or 128 register bits, and tends to assume that either unsigned zero extension or sign-bit extension will normally affect the upper bits, even when that is not the case.

Still, the original example is quite stunning in its total compiler stupidity, since there is no excuse whatever, no matter what extension assumptions apply, for zeroing the lower bits of the value returned by 'foobar'.

It is almost as if some coder had been interrupted in designing how an implied typecast to 128 bits should be done, leaving only the parts already completed and intended to clear the upper bits (but affecting the entire value), and forgetting to later add any parts for preserving the original lower bits...

Best regards: dlanor

ragnarok2040 · Post by **ragnarok2040** » Mon Oct 05, 2009 10:00 am

Yeah, I've run into that 64-bit typecasting problem myself. Doing direct assignments with implicit typecasting to the other types in the union seems to work fine as long as the qword has been initialized to 0. For negative values, I can always store the value, pnor the register with the zero register, and load it back to sign extend.

I did make some handwritten assembly just in case I come into a situation where it's needed, :D. I also noticed if I just insert an empty asm("": "=r" (ret) : "r" (foo()); into bar(), it gets rid of the move instruction and the value of v0 is left unmodified allowing it to be passed. That might cause random bits in v0 to be left behind since foo() just loads 5 immediately into v0.

dlanor · Post by **dlanor** » Tue Oct 06, 2009 3:57 am

ragnarok2040 wrote:Yeah, I've run into that 64-bit typecasting problem myself. Doing direct assignments with implicit typecasting to the other types in the union seems to work fine as long as the qword has been initialized to 0. For negative values, I can always store the value, pnor the register with the zero register, and load it back to sign extend.

For an assembly programmer there is always something we can do about it, with direct control of low-level code and ability to "jr $ra" whenever we feel all is right. But even for us it can sometimes be hard to realize that such special measures are needed. And without that realization, we too are stuck with the errors. Hopefully they will then show up in debugging, so we can use asm methods to eliminate them, but with bad luck we don't get any clear indication of these bugs, which then remain and cause the program to misbehave in the future...

I did make some handwritten assembly just in case I come into a situation where it's needed, :D. I also noticed if I just insert an empty asm("": "=r" (ret) : "r" (foo()); into bar(), it gets rid of the move instruction and the value of v0 is left unmodified allowing it to be passed. That might cause random bits in v0 to be left behind since foo() just loads 5 immediately into v0.

True, but once we are aware of such a problem we'll always manage to fix it properly, after some tries at least. The real problem is for the cases when we are not aware that such bugs are involved.

This is bad enough for us who do know how to use asm to get around such problems, and have some understanding of how these things happen, but consider how bad it will be for those many C coders who lack experience and/or knowledge of asm. They have very little chance to understand what is happening when this kind of bug strikes, and even less chance to do anything effective to counter it.

Best regards: dlanor