Error using VFPU matrix 3?

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Error using VFPU matrix 3?

Post by Criptych »

I have the following code - which works, that's not the problem. The problem is, it didn't work when I used VFPU matrix 3 instead of matrix 2. Is there any known problem with it?

Code: Select all

   for &#40;i = 0; i < mdl->numObjects; ++i&#41;
   &#123;
      pl2Object *obj = mdl->objects&#91;i&#93;;

      pl2Vertex *vert = &&#40;obj->verts&#91;0&#93;&#41;;
      pgeVertTNV *tvert = &&#40;obj->tverts&#91;0&#93;&#41;;

      for &#40;j = 0; j < obj->numVerts; ++j&#41;
      &#123;
         ScePspFVector4 v = &#123; vert->x,  vert->y,  vert->z,  1 &#125;;
         ScePspFVector4 n = &#123; vert->nx, vert->ny, vert->nz, 0 &#125;;

         __asm__ volatile&#40;
            "vzero.t C200\n"           // C300 = new vertex
            "vzero.t C210\n"           // C310 = new normal
            "ulv.q   C220, 0 + %0\n"   // C320 = original vertex
            "ulv.q   C230, 0 + %1\n"   // C330 = original normal
            &#58;&#58; "m"&#40;v&#41;, "m"&#40;n&#41;&#41;;

         for &#40;k = 0; k < 3; ++k&#41;
         &#123;
            if &#40;vert->bones&#91;k&#93; == 0xff&#41; break;

            __asm__ volatile&#40;
               "ulv.q   C000,  0 + %0\n"     // M000 = transform matrix
               "ulv.q   C010, 16 + %0\n"
               "ulv.q   C020, 32 + %0\n"
               "ulv.q   C030, 48 + %0\n"
               "lv.s    S110,  0 + %1\n"     // S110 = weight
               "vtfm4.q C100, M000, C220\n"  // C100 = &#91; transform matrix &#93; &#91; original vertex &#93;
               "vscl.t  C100, C100, S110\n"  // add weighted value to new vertex
               "vadd.t  C200, C200, C100\n"
               "vtfm4.q C100, M000, C230\n"  // C100 = &#91; transform matrix &#93; &#91; original normal &#93;
               "vscl.t  C100, C100, S110\n"  // add weighted value to new normal
               "vadd.t  C210, C210, C100\n"
               &#58; &#58; "m"&#40;tfm&#91;vert->bones&#91;k&#93;&#93;&#41;, "m"&#40;vert->weight&#91;k&#93;&#41;&#41;;
         &#125;

         __asm__ volatile&#40;
            "vdot.t     S000, C210, C210\n"  // normalize new normal
            "vrsq.s     S000, S000\n"
            "vscl.t     C210&#91;-1&#58;1,-1&#58;1,-1&#58;1&#93;, C210, S000\n"
            "sv.s       S200, 0 + %0\n"      // store new vertex
            "sv.s       S201, 0 + %1\n"
            "sv.s       S202, 0 + %2\n"
            "sv.s       S210, 0 + %3\n"      // store new normal
            "sv.s       S211, 0 + %4\n"
            "sv.s       S212, 0 + %5\n"
            &#58; "=m" &#40;tvert->x&#41;,  "=m" &#40;tvert->y&#41;,  "=m" &#40;tvert->z&#41;,
              "=m" &#40;tvert->nx&#41;, "=m" &#40;tvert->ny&#41;, "=m" &#40;tvert->nz&#41;&#41;;

         tvert->u  = vert->u;
         tvert->v  = vert->v;

         ++vert; ++tvert;
      &#125;
   &#125;
Last edited by Criptych on Tue Oct 20, 2009 7:34 am, edited 1 time in total.
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

Did you test this on a psp-1000 by chance?
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

crazyc wrote:Did you test this on a psp-1000 by chance?
No, only a 2000. I don't have a 1000. (Might be a good idea to get one for testing, though.)
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

There was an issue with ulv.q corrupting an FPU register, but I think it was fixed after the 1000. You could probably account for it by adding a clobber constraint. (something like $fp0 with C000, $fp1 with C100 I think)
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

crazyc wrote:There was an issue with ulv.q corrupting an FPU register, but I think it was fixed after the 1000.
I'd heard about that, and I'll try to remove them before releasing for compatibility. But yes, my first thought was some other function had corrupted the values - then I remembered that it doesn't call any other functions, especially not within the critical loop (numVerts) where the values are needed. It's just really weird that by changing only the matrix number it went from "not working" to "no problem." :-\
crazyc wrote:You could probably account for it by adding a clobber constraint. (something like $fp0 with C000, $fp1 with C100 I think)
Way over my head, I've only ever used assembly for minimal VFPU access, and I'm still learning most of the opcodes. :P How do I add a "clobber constraint"? Is it an extra argument/modifier to the instruction?

Bugger. I tried changing the 2s back to 3s and it still works... maybe it was only a typo that got worked out when I rewrote my function to use matrix 2. Sorry for wasting your time, crazyc. orz

At least now I know it can be done with 3 matrices instead of 4.
Last edited by Criptych on Tue Oct 20, 2009 11:26 am, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

yeah, ulvq is corrupting fpu registers, i tested it on my old psp1k. As long as you need to be compatible, you have no choice but no using ulvq.

can you be more precise about which instructions are giving bad results ?
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

Criptych wrote: Way over my head, I've only ever used assembly for minimal VFPU access, and I'm still learning most of the opcodes. :P How do I add a "clobber constraint"? Is it an extra argument/modifier to the instruction?
A clobber constraint is an extra parameter added to the inline asm block that tells gcc that a register contents will be destroyed, Something like in your first block

Code: Select all

         __asm__ volatile&#40;
            "vzero.t C200\n"           // C300 = new vertex
            "vzero.t C210\n"           // C310 = new normal
            "ulv.q   C220, 0 + %0\n"   // C320 = original vertex
            "ulv.q   C230, 0 + %1\n"   // C330 = original normal
            &#58;&#58; "m"&#40;v&#41;, "m"&#40;n&#41;&#58;"$fp2"&#41;; 
to indicate $fp2 contents should not be assumed to be preserved. I don't know if this will work as I've never tried it.
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

I'll give it a shot. Is this register corruption the only reason not to use ulv.q/usv.q? From what I read in the diggins they aren't significantly slower than the aligned version; how do they compare to lv.s/sv.s * 4?
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

Criptych wrote:I'll give it a shot. Is this register corruption the only reason not to use ulv.q/usv.q?
I believe so.
Criptych wrote: From what I read in the diggins they aren't significantly slower than the aligned version; how do they compare to lv.s/sv.s * 4?
I don't know. I haven't done much with the VFPU.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Criptych wrote:I'll give it a shot. Is this register corruption the only reason not to use ulv.q/usv.q? From what I read in the diggins they aren't significantly slower than the aligned version; how do they compare to lv.s/sv.s * 4?
the issue is only with ulv.q, not with usv.q. You have choice between lv.q (16-byte alignment) or lv.s (if you use 2D for instance). Normaly I added an option to psp-gcc to make stack alignment to what you want. It is the same option you'll find for x86. I don't remember the name :/ as it was a long time ago i did it.
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

hlide wrote:the issue is only with ulv.q, not with usv.q. You have choice between lv.q (16-byte alignment) or lv.s (if you use 2D for instance).
Okay, I've changed my code to use only lv.q, and seems to be working.
hlide wrote:Normaly I added an option to psp-gcc to make stack alignment to what you want. It is the same option you'll find for x86. I don't remember the name :/ as it was a long time ago i did it.
Could it be "-mpreferred-stack-boundary"?

@crazyc: Adding clobber constraints gave me several errors like "error: unknown register name '$fp2' in 'asm.'" But since I'm not using ulv.q anymore, I wouldn't worry about it.
Last edited by Criptych on Sun Nov 01, 2009 6:04 pm, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Criptych wrote:
hlide wrote:the issue is only with ulv.q, not with usv.q. You have choice between lv.q (16-byte alignment) or lv.s (if you use 2D for instance).
Okay, I've changed my code to use only lv.q, and seems to be working.
hlide wrote:Normaly I added an option to psp-gcc to make stack alignment to what you want. It is the same option you'll find for x86. I don't remember the name :/ as it was a long time ago i did it.
Could it be "-mpreferred-stack-boundary"?
exactly, normally i made a patch to add it for mips/allegrex as it was specific to x86. I think Heimdall uses it for his mingw version of psp-gcc. Not sure for the pspdev.org SVN version.
Last edited by hlide on Wed Oct 21, 2009 7:10 am, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Criptych wrote:@crazyc: Adding clobber constraints gave me several errors like "error: unknown register name '$fp2' in 'asm.'" But since I'm not using ulv.q anymore, I wouldn't worry about it.
if I'm not wrong, it should be '$fpr2' or '$f2'. And it only works for save-callee registers, which are $f20-$f31. So registers $f0-$f19 cannot be clobberred this way.
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

hlide wrote:if I'm not wrong, it should be '$fpr2' or '$f2'. And it only works for save-callee registers, which are $f20-$f31. So registers $f0-$f19 cannot be clobberred this way.
Okay, it accepts "$f2." But does that last part mean you can't use clobber constraints or that their values won't get clobbered? If the former, it sounds like crazyc's idea won't work. :(
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Criptych wrote:Okay, it accepts "$f2." But does that last part mean you can't use clobber constraints or that their values won't get clobbered? If the former, it sounds like crazyc's idea won't work. :(
I tried to use this feature in my old yapse4psp (psx emulator) and found out it only works on save-callee registers. The purpose was to prevent from some C functions called by dynarec code to use $t0-$t9 registers as I wanted them for virtual registers mapped to psx registers.
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

hlide wrote:I tried to use this feature in my old yapse4psp (psx emulator) and found out it only works on save-callee registers. The purpose was to prevent from some C functions called by dynarec code to use $t0-$t9 registers as I wanted them for virtual registers mapped to psx registers.
I'm not sure I understand what you mean by a "save-callee" register. Are you saying the values in other registers get clobbered anyway?
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Criptych wrote:I'm not sure I understand what you mean by a "save-callee" register. Are you saying the values in other registers get clobbered anyway?
Ok, a caller is the code which calls a function. Callee is the code of the called function. Locals in a function are allocated through save-callee registers if possible else in stack. So it means your functions will have a prologue to save those registers and an epilogue to restore those registers. In the body you can change the values of this registers because they are LOCAL.

for MIPS, the save-callee registers are $s0-$s8 (there are some special registers too but let us keep it simple). The other registers - if they are not simply fix registers, that is, not used by gcc - are considered as save-caller registers (that means you NEED to save them BEFORE any call to a function if you want to restore them AFTER any call. The fact is I never see gcc does it automatically, that is, clobbering them doesn't enforce a save/restore of those registers BEFORE/AFTER any call. We could at least expect for gcc not to use those clobbered registers but in practice it does use them even after clobbering them. Well my function was calling some inline functions so maybe those inline functions defected the clobbering purpose.

The same thing goes with FPR, some are save-callee registers, others are save-caller registers.

So, since clobbering constraints seem to be effective only for save-callee registers, i cannot use it
Criptych
Posts: 64
Joined: Sat Sep 12, 2009 5:18 am

Post by Criptych »

Okay, I understand the concept but I'd never heard that term used before. Thanks. :)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Criptych wrote:Okay, I understand the concept but I'd never heard that term used before. Thanks. :)
This is a technical term. Just google it and you'll find a ton.
davidgf
Posts: 21
Joined: Mon Aug 31, 2009 10:05 pm

Post by davidgf »

Sorry for reopening this post, but I can't find anything in google about this.
Where did you exactly find out the register corruption? I'm interested in this.
But if I don't use Matrix 0 shoul'd be any problem, isn't it?

After some tests on a FAT PSP I notices some bugs related to some register corruption and I use ulv a lot.

Thanks!
Post Reply