[VFPU] lvl.q/lvr.q bug : they screw up a FPU reg !

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

[VFPU] lvl.q/lvr.q bug : they screw up a FPU reg !

Post by hlide »

I have two PSPs : one is my old fat PSP and the other a slim PSP.

I run a program using PSPLINK to check up a curious bug about lvl.q/lvr.q.

What is the bug ?

executing a LVL.Q or LVR.Q with a vector register scratches a FPU register this way : if VFPU register number is $N, the FPU register $fN is assigned with 0.0 (at least, this is the value I get on my fat PSP). This is a major issue for callee-saved FPU register which must be invariant before entering the callee function and after exiting the callee function, especially when this callee function doesn't use those callee-saved FPU registers.

On my slim PSP, I don't have this hardware issue so I guess Sony correct it.

Here is a piece of my program to test this bug :

Code: Select all

...

float vec4[4] __attribute__((aligned(16))) = { 1.0, 2.0, 3.0, 4.0 };
float expected_value = -1.0;

template< int m, int c >
float vfpu_test_lvlq&#40;float *vec&#41;
&#123;
    float res = expected_value;

    asm volatile
    &#40;
        ".set    push             " "\n"
        ".set    noreorder        " "\n"
        "mov.s   $f%2, %0         " "\n"
        "lvl.q   $%2,12&#40;%1&#41;       " "\n" // lvl.q Cmc0.q, 12&#40;%1&#41;
        "mov.s   %0, $f%2         " "\n"
        ".set    pop              " "\n"
        &#58; "+f"&#40;res&#41; &#58; "r"&#40;vec&#41;, "i"&#40;m*4+c&#41; &#58; "memory"
    &#41;;

    return res;
&#125;

template< int m, int c >
float vfpu_test_lvrq&#40;float *vec&#41;
&#123;
    float res = expected_value;

    asm volatile
    &#40;
        ".set    push             " "\n"
        ".set    noreorder        " "\n"
        "mov.s   $f%2, %0         " "\n"
        "lvr.q   $%2, 0&#40;%1&#41;       " "\n" // lvr.q Cmc0.q, 0&#40;%1&#41;
        "mov.s   %0, $f%2         " "\n"
        ".set    pop              " "\n"
        &#58; "+f"&#40;res&#41; &#58; "r"&#40;vec&#41;, "i"&#40;m*4+c&#41; &#58; "memory"
    &#41;;

    return res;
&#125;

int main&#40;int argc, char *argv&#91;&#93;&#41;
&#123;

    printf&#40;"\n$f0 = %f, %f", vfpu_test_lvlq<0, 0>&#40;vec4&#41;, vfpu_test_lvrq<0, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f1 = %f, %f", vfpu_test_lvlq<0, 1>&#40;vec4&#41;, vfpu_test_lvrq<0, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f2 = %f, %f", vfpu_test_lvlq<0, 2>&#40;vec4&#41;, vfpu_test_lvrq<0, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f3 = %f, %f", vfpu_test_lvlq<0, 3>&#40;vec4&#41;, vfpu_test_lvrq<0, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f4 = %f, %f", vfpu_test_lvlq<1, 0>&#40;vec4&#41;, vfpu_test_lvrq<1, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f5 = %f, %f", vfpu_test_lvlq<1, 1>&#40;vec4&#41;, vfpu_test_lvrq<1, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f6 = %f, %f", vfpu_test_lvlq<1, 2>&#40;vec4&#41;, vfpu_test_lvrq<1, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f7 = %f, %f", vfpu_test_lvlq<1, 3>&#40;vec4&#41;, vfpu_test_lvrq<1, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f8 = %f, %f", vfpu_test_lvlq<2, 0>&#40;vec4&#41;, vfpu_test_lvrq<2, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f9 = %f, %f", vfpu_test_lvlq<2, 1>&#40;vec4&#41;, vfpu_test_lvrq<2, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f10= %f, %f", vfpu_test_lvlq<2, 2>&#40;vec4&#41;, vfpu_test_lvrq<2, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f11= %f, %f", vfpu_test_lvlq<2, 3>&#40;vec4&#41;, vfpu_test_lvrq<2, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f12= %f, %f", vfpu_test_lvlq<3, 0>&#40;vec4&#41;, vfpu_test_lvrq<3, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f13= %f, %f", vfpu_test_lvlq<3, 1>&#40;vec4&#41;, vfpu_test_lvrq<3, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f14= %f, %f", vfpu_test_lvlq<3, 2>&#40;vec4&#41;, vfpu_test_lvrq<3, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f15= %f, %f", vfpu_test_lvlq<3, 3>&#40;vec4&#41;, vfpu_test_lvrq<3, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f16= %f, %f", vfpu_test_lvlq<4, 0>&#40;vec4&#41;, vfpu_test_lvrq<4, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f17= %f, %f", vfpu_test_lvlq<4, 1>&#40;vec4&#41;, vfpu_test_lvrq<4, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f18= %f, %f", vfpu_test_lvlq<4, 2>&#40;vec4&#41;, vfpu_test_lvrq<4, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f19= %f, %f", vfpu_test_lvlq<4, 3>&#40;vec4&#41;, vfpu_test_lvrq<4, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f20= %f, %f", vfpu_test_lvlq<5, 0>&#40;vec4&#41;, vfpu_test_lvrq<5, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f21= %f, %f", vfpu_test_lvlq<5, 1>&#40;vec4&#41;, vfpu_test_lvrq<5, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f22= %f, %f", vfpu_test_lvlq<5, 2>&#40;vec4&#41;, vfpu_test_lvrq<5, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f23= %f, %f", vfpu_test_lvlq<5, 3>&#40;vec4&#41;, vfpu_test_lvrq<5, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f24= %f, %f", vfpu_test_lvlq<6, 0>&#40;vec4&#41;, vfpu_test_lvrq<6, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f25= %f, %f", vfpu_test_lvlq<6, 1>&#40;vec4&#41;, vfpu_test_lvrq<6, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f26= %f, %f", vfpu_test_lvlq<6, 2>&#40;vec4&#41;, vfpu_test_lvrq<6, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f27= %f, %f", vfpu_test_lvlq<6, 3>&#40;vec4&#41;, vfpu_test_lvrq<6, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f28= %f, %f", vfpu_test_lvlq<7, 0>&#40;vec4&#41;, vfpu_test_lvrq<7, 0>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f29= %f, %f", vfpu_test_lvlq<7, 1>&#40;vec4&#41;, vfpu_test_lvrq<7, 1>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f30= %f, %f", vfpu_test_lvlq<7, 2>&#40;vec4&#41;, vfpu_test_lvrq<7, 2>&#40;vec4&#41;&#41;;
    printf&#40;"\n$f31= %f, %f", vfpu_test_lvlq<7, 3>&#40;vec4&#41;, vfpu_test_lvrq<7, 3>&#40;vec4&#41;&#41;;
    printf&#40;"\n"&#41;;

    sceKernelExitGame&#40;&#41;;

    return 0;
&#125;

...
normally, we should have -1.0 for all the FPU registers (as it is the case for slim PSP), but for fat PSP I get 0.0 (no idea where this value comes from).
adrahil
Posts: 274
Joined: Thu Mar 16, 2006 1:55 am

Post by adrahil »

Strange problem indeed...
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

It might have something to do with the way those opcodes are encoded.

Code: Select all

# float vfpu_test_lvlq<0, 0>&#40;float *&#41;
               lwc1    $f0, -0x7FF0&#40;$gp&#41;
               mov.s   $f0, $f0
               ldc1    $f0, 0xC&#40;$a0&#41;
               mov.s   $f0, $f0
               jr      $ra
               nop
# End of function vfpu_test_lvlq<0,0>&#40;float *&#41;

# float vfpu_test_lvrq<0, 0>&#40;float *&#41;
               lwc1    $f0, -0x7FF0&#40;$gp&#41;
               mov.s   $f0, $f0
               ldc1    $f0, 2&#40;$a0&#41;
               mov.s   $f0, $f0
               jr      $ra
               nop
# End of function vfpu_test_lvrq<0,0>&#40;float *&#41;

# float vfpu_test_lvq<0, 0>&#40;float *&#41;
               lwc1    $f0, -0x7FF0&#40;$gp&#41;
               mov.s   $f0, $f0
               ldc2    $0, 0&#40;$a0&#41;
               mov.s   $f0, $f0
               jr      $ra
               nop
# End of function vfpu_test_lvq<0,0>&#40;float *&#41;
ldc1 may be executed along with lvl.q.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

crazyc wrote:ldc1 may be executed along with lvl.q.
Not only lvl.q but lvr.q as they share the same opcode (bit 1 in offset tells us if it is lvl or lvr and bit 0 in offset tells us if it is a column or row register). So, yes. That's a perfect explanation ! SONY simply forget to "disable" the FPU register write back circuitry for ldc1. A very stupid error indeed.
Post Reply