VFPU math lib

MrMr[iCE] · Post by **MrMr[iCE]** » Wed Jan 24, 2007 9:35 pm

I've been working on a general purpose math lib to replace most of libm's implementations of major math functions. I'm almost done with it but here's a taste of what to expect, in cycle times:

Code: Select all

let v  = 0.4f

- sinf&#40;v&#41; = 0.389418, cycles&#58; 856
- vfpu_sinf&#40;v&#41; = 0.389418, cycles&#58; 160
 
- cosf&#40;v&#41; = 0.921061, cycles&#58; 990
- vfpu_cosf&#40;v&#41; = 0.921061, cycles&#58; 154
 
- tanf&#40;v&#41; = 0.422793, cycles&#58; 1632
- vfpu_tanf&#40;v&#41; = 0.422793, cycles&#58; 121
 
- asinf&#40;v&#41; = 0.411517, cycles&#58; 1265
- vfpu_asinf&#40;v&#41; = 0.411517, cycles&#58; 154
 
- acosf&#40;v&#41; = 1.159279, cycles&#58; 1433
- vfpu_acosf&#40;v&#41; = 1.159280, cycles&#58; 107
 
- atanf&#40;v&#41; = 0.380506, cycles&#58; 692
- vfpu_atanf&#40;v&#41; = 0.380506, cycles&#58; 126
 
- sinhf&#40;v&#41; = 0.410752, cycles&#58; 1898
- vfpu_sinhf&#40;v&#41; = 0.410752, cycles&#58; 356
 
- coshf&#40;v&#41; = 1.081072, cycles&#58; 1885
- vfpu_coshf&#40;v&#41; = 1.081072, cycles&#58; 246
 
- tanhf&#40;v&#41; = 0.379949, cycles&#58; 1525
- vfpu_tanhf&#40;v&#41; = 0.379949, cycles&#58; 208
 
- expf&#40;v&#41; = 1.491825, cycles&#58; 1351
- vfpu_expf&#40;v&#41; = 1.491824, cycles&#58; 212
 
- logf&#40;v&#41; = -0.916291, cycles&#58; 1409
- vfpu_logf&#40;v&#41; = -0.916292, cycles&#58; 210
 
- fabsf&#40;v&#41; = 0.400000, cycles&#58; 7
- vfpu_fabsf&#40;v&#41; = 0.400000, cycles&#58; 93
 
- sqrtf&#40;v&#41; = 0.632456, cycles&#58; 40
- vfpu_sqrtf&#40;v&#41; = 0.632455, cycles&#58; 240
 
- powf&#40;v, v&#41; = 0.693145, cycles&#58; 3488
- vfpu_powf&#40;v, v&#41; = 0.693145, cycles&#58; 412

The only clear losers in my lib are fabsf and sqrtf, since the floating point ops on the allegrex already do those functions..I'll prolly axe those, let the allegrex deal with those since it seems to do well in that area.

My lib will have other functions not in libm though, some related to vector operations (like vfpu_normalize_vector), some matrix ops (like vfpu_perspective_matrix, libpspgum_vfpu didnt have a proper asm version at the time), and some quaternion math as well. Stay tuned as I make progress.

As I said, this is a work in progress, as such, dont expect much help from me using it =) Here's the link, you should know what do with the files: http://mrmrice.fx-world.org/libpspmath.zip

Tinnus · Post by **Tinnus** » Thu Jan 25, 2007 12:30 am

I think it should be useful to add functions like sinfd, sinft and sinfq (for all functions) to speedup calculations in the case the user wants/can calculate 2, 3 or 4 values at the same time. Should be faster than calling the function multiple times :)

I know the main purpose is to just replace libm, but why not enable a little (more) optimization as well? :)

hlide · Post by **hlide** » Thu Jan 25, 2007 1:53 am

well if you were on #pspdev, you should probably know that to enable more optimization you need a gcc "aware" of vfpu registers (single or vector ones). So such optimizations for sinfd/t/q would not really be efficient and very cumbersome to use.

MrMr[iCE] · Post by **MrMr[iCE]** » Fri Jan 26, 2007 4:54 am

v2 is up for grabs

http://mrmrice.fx-world.org/files/libpspmath_v2.zip

This one adds quite a few quaternion functions, check pspmath.h for the full list. atan2f is available as well.

to install:

Code: Select all

make
make install

make install will copy the lib and header files into the proper sdk folders, so just do #include <pspmath.h> and add -lpspmath to your LIBS line in your makefiles

cools · Post by **cools** » Fri Jan 26, 2007 1:02 pm

This is pretty cool! It can make using the vfpu a lot easier for people who dont know how to use the vfpu...

MrMr[iCE] · Post by **MrMr[iCE]** » Fri Jan 26, 2007 5:12 pm

v3 is up now

http://mrmrice.fx-world.org/files/libpspmath_v3.zip

This adds one new function: vfpu_ease_in_out. This function is mainly for animation control, when you want to smoothly interpolate between 2 points with an accleration and deceleration curve.

Ive included a sample in the zip that shows how to use the quaternion functions, including how to interpolate between 2 quaternions, how to make a random 3d starfield and some other features. like the vfpu random number generator.

enjoy =)

Bytrix · Post by **Bytrix** » Mon Jan 29, 2007 8:01 pm

Great work MrMr[iCE]. It'll be interesting to see how these functions speed up my games. Especially in collision detection.

pegasus2000 · Post by **pegasus2000** » Fri Feb 02, 2007 4:23 am

MrMr[iCE] wrote:v3 is up now

http://mrmrice.fx-world.org/files/libpspmath_v3.zip

This adds one new function: vfpu_ease_in_out. This function is mainly for animation control, when you want to smoothly interpolate between 2 points with an accleration and deceleration curve.

Ive included a sample in the zip that shows how to use the quaternion functions, including how to interpolate between 2 quaternions, how to make a random 3d starfield and some other features. like the vfpu random number generator.

enjoy =)

What is the license ? BSD ?

Xfacter · Post by **Xfacter** » Wed Feb 28, 2007 10:24 am

Great lib, but there's a problem with vfpu_atan2f. An x and y of zero will cause a crash (divide by zero). Most libraries I've seen return 0 when this is encountered, easy fix.

hlide · Post by **hlide** » Wed Feb 28, 2007 4:13 pm

Xfacter wrote:Great lib, but there's a problem with vfpu_atan2f. An x and y of zero will cause a crash (divide by zero). Most libraries I've seen return 0 when this is encountered, easy fix.

add something like "vcmp.s NS, S000; vcmovt.s S000, S000[0], 0" at the end of the function, and any situation where a NaN would occur would turn the result in 0.0. You may need to adapt S000 to the right vfpu register being used as a result in this function.

KickinAezz · Post by **KickinAezz** » Sun Jun 10, 2007 4:14 am

http://mrmrice.fx-world.org/files/libpspmath_v4.zip -<<< Is also available

Version 4!??

hlide · Post by **hlide** » Thu Jun 14, 2007 8:41 pm

KickinAezz wrote:http://mrmrice.fx-world.org/files/libpspmath_v4.zip -<<< Is also available

Version 4!??

I have the libpspmath v4 given by MrMrIce the last time I "saw" him. So yes, the last version is 4.

Art · Post by **Art** » Mon Jun 18, 2007 1:26 pm

This is a noob question, but if you define floats in your app, and perform
math operations on them, does that automatically mean that the FPU is used
to calculate them in the PSP?

Raphael · Post by **Raphael** » Mon Jun 18, 2007 4:59 pm

Yes. Else it would have to be done in software emulation (as IS done with doubles) and this would be hella slow.

Art · Post by **Art** » Mon Jun 18, 2007 7:11 pm

Thanks, I suspected so.

Xfacter · Post by **Xfacter** » Mon Jul 02, 2007 8:02 am

I don't know if you meant to do this or what, but vfpu_logf is actually natural log... Here's log:

Code: Select all

float vfpu_log&#40;float x&#41; &#123;
    float result;
    __asm__ volatile &#40;
        "mtv     %1, S000\n"
        "vcst.s  S001, VFPU_LOG2TEN\n"
        "vrcp.s  S001, S001\n"
        "vlog2.s S000, S000\n"
        "vmul.s  S000, S000, S001\n"
        "mfv     %0, S000\n"
        &#58; "=r"&#40;result&#41; &#58; "r"&#40;x&#41;&#41;;
    return result;
&#125;

Edit: Oops, don't mind me. Guess I don't know much about libm's naming scheme (generally when I see "log" I assume log10). Sorry!

hlide · Post by **hlide** » Thu Aug 30, 2007 2:16 pm

DELETED

hlide · Post by **hlide** » Thu Aug 30, 2007 2:17 pm

Xfacter wrote: Edit: Oops, don't mind me. Guess I don't know much about libm's naming scheme (generally when I see "log" I assume log10). Sorry!

log(x) = log10(x) = log2(x)/log2(10);

this is what the code is doing, so there's no problem here.

Xfacter · Post by **Xfacter** » Thu Aug 30, 2007 2:37 pm

hlide wrote:
Xfacter wrote: Edit: Oops, don't mind me. Guess I don't know much about libm's naming scheme (generally when I see "log" I assume log10). Sorry!
log(x) = log10(x) = log2(x)/log2(10);

this is what the code is doing, so there's no problem here.

That's what the revised code I posted does, the one in the library is natural log.

headness13 · Post by **headness13** » Sun Oct 21, 2007 6:25 pm

very nice library, but there are a few functions missing, like:

rotate_matrixX(M44, angle)
rotate_matrixY(M44, angle)
rotate_matrixZ(M44, angle)
rotate_matrixXYZ(M44, anglex, angley, anglez)
multiply_matrix(M44, M44, M44)
vector_cross_multiply (for FVector3 and FVector4)

could you help me with this things ?! - i can make them in C code, but asm is so much faster - and speed is a ishue at what i-m doing
thanx if you decide to help me MrMr[iCE]

snowsquirrel · Post by **snowsquirrel** » Sun Apr 06, 2008 12:27 am

hello,

I am interested in the vector proecessing funciton in the lb. But I notice that they are v4's and not v3's. To use this with v3's, do I have to assign x,y,z in my v4 to xyz in my v3. Or do most people just use v4's for everything, ignoring the 'w' when only xyz are needed?

~S

hlide · Post by **hlide** » Sun Apr 06, 2008 1:34 am

well, as an example :

Code: Select all

void vfpu_add_vector&#40;ScePspFVector4 *vout, ScePspFVector4 *va, ScePspFVector4 *vb&#41; &#123;
   __asm__ volatile &#40;
       "lv.q    C000, %1\n"
       "lv.q    C010, %2\n"
       "vadd.t  C020, C000, C010\n"
       "sv.q    C020, %0\n"
       &#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;&#41;;
&#125;

vadd.t adds two 3D vectors, not 4D vectors

there is a reason why we use ScePspFVector4 instead of ScePspFVector3 : "lv.q ..."
vfpu can read a 4-component vector in one instruction instead of 4 single "lv.s"

but it cannot load or store a 2D or 3D vectors in memory.

but if you really want that bad a ScePspFVector3 instead, you need to convert it into :

Code: Select all

void vfpu_add_vector3&#40;ScePspFVector3 *vout, ScePspFVector3 *va, ScePspFVector3 *vb&#41; &#123;
   __asm__ volatile &#40;
       //"lv.q    C000, %1\n"
       "lv.s    S000, %4\n"
       "lv.s    S001, %5\n"
       "lv.s    S002, %6\n"

       //"lv.q    C010, %2\n"
       "lv.s    S010, %7\n"
       "lv.s    S011, %8\n"
       "lv.s    S012, %9\n"

       "vadd.t  C020, C000, C010\n"
       //"sv.q    C020, %0\n"
       "sv.s    S020, %0\n"
       "sv.s    S021, %1\n"
       "sv.s    S022, %2\n"
       &#58; "+m"&#40;vout->x&#41;, "+m"&#40;vout->y&#41;, "+m"&#40;vout->z&#41;
       &#58; "m"&#40;va->x&#41;, "m"&#40;va->y&#41;, "m"&#40;va->z&#41;,
         "m"&#40;vb->x&#41;, "m"&#40;vb->y&#41;, "m"&#40;vb->z&#41;&#41;;
&#125;

snowsquirrel · Post by **snowsquirrel** » Sun Apr 06, 2008 1:50 am

yes, I have looked at the code, but it really means nothing too me.

I suspected that the instructions were optimized for v4's not v3's. I think that is pretty common.

But I am not 100% sure what you are saying?

Code: Select all

"lv.q    C000, %1\n"
"lv.q    C010, %2\n"

loading va and vb into registers

Code: Select all

"vadd.t  C020, C000, C010\n"

add those two registers. you said vadd.t adds 3d vectors? so what is happening to 'w' here?

Code: Select all

"sv.q    C020, %0\n"

save result to vout.

Code: Select all

&#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;

what is this doing?

I assume if I used this exact code with v3's memory stomping would occur where it expected the last 4 bytes of the v4.

sorry to be daft, I have just never worked with assember much, other than some trivial stuff years ago.

~S

hlide · Post by **hlide** » Sun Apr 06, 2008 2:16 am

snowsquirrel wrote:yes, I have looked at the code, but it really means nothing too me.

I suspected that the instructions were optimized for v4's not v3's. I think that is pretty common.

But I am not 100% sure what you are saying?
Code: Select all
"lv.q    C000, %1\n"
"lv.q    C010, %2\n"
loading va and vb into registers
Code: Select all
"vadd.t  C020, C000, C010\n"
add those two registers. you said vadd.t adds 3d vectors? so what is happening to 'w' here?
Code: Select all
"sv.q    C020, %0\n"
save result to vout.
Code: Select all
&#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;
what is this doing?

I assume if I used this exact code with v3's memory stomping would occur where it expected the last 4 bytes of the v4.

sorry to be daft, I have just never worked with assember much, other than some trivial stuff years ago.

~S

vfpu instructions use suffixes like .s (single/scalar), .p (pair, 2D), .t(triple, 3D), .q(quad, 4D)

with .q, x, y, z and w are touched
with .t, only x, y and z are touched, w is left untouched
with .p, only x and y are touched, z and w are left untouched
with .s, you can access any element in a matrix

so any instructions with .t suffix would ignore your w component.

load and store operations :

lv = load vector
sv = store vector

lv.s/sv.s = load/store a scalar element in matrix (Smcr = element in matrix m, at column c, at row r).

lv.q/sv.q = load/store a 4D vector (Cmcr = column c vector in matrix m, starting from element at row r; Rmcr = row r vector in matrix m, starting from element at column c).

there is no lv.p/sv.p/lv.t/sv.t, so you need to split into lv.s/sv.s

inline_asm : 'asm' '(' asm_sequence_string [':' [ output_regs ] [ ':' [ input_regs ] [ ':' clobbered_regs ] ] ] ')' ';'

output_registers : output_register [ output_registers ] ;

input_registers : input_register [ input_registers ] ;

output_register :

"=m"(var) --> this a memory place just for output

"+m"(var) --> this a memory place for input and output

input_register :

"m"(var) --> this a memory place just for input

%0, %1, ... : if you have in this order : "=m"(*vout) : "m"(*va), "m"(*vb), gcc would substitute %0 with a machine address to access vout, %1 to access va and %2 to access vb. Seem them as macro parameters.

snowsquirrel · Post by **snowsquirrel** » Sun Apr 06, 2008 3:14 am

Ok I understand everything but this line:

Code: Select all

&#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;"

So this line is mapping the value in registers to value in memory? If so, shouldn't this mapping be done at the beginning? Or does it take effect in the beginning?

So if my C code uses v3's, would I be better to manually add the 3 elements, or create temporary v4's with v3 values, and then use vfpu_vector_add function? Or use the 3x lv.s method?

Thanks,
~S

hlide · Post by **hlide** » Sun Apr 06, 2008 3:55 am

have a look upon http://www.ibiblio.org/gferg/ldp/GCC-In ... HOWTO.html .

hlide · Post by **hlide** » Sun Apr 06, 2008 6:06 am

snowsquirrel wrote:So if my C code uses v3's, would I be better to manually add the 3 elements, or create temporary v4's with v3 values, and then use vfpu_vector_add function? Or use the 3x lv.s method?
~S

3xlv.s method is probably best for this case, so long as your vector only has 3 components.

snowsquirrel · Post by **snowsquirrel** » Sun Apr 06, 2008 1:02 pm

hlide wrote:have a look upon http://www.ibiblio.org/gferg/ldp/GCC-In ... HOWTO.html .

Extended Asm. I get it now. Thanks.
~S

Heimdall · Post by **Heimdall** » Thu Apr 10, 2008 4:35 pm

Hi guys,

From v3 to v4 I see that there was a change on the API and the demo on v3 doesn't build, does anyone know what changed from v3 to v4? (so i can be lazzy and don't diff the code ;))

Heimdall · Post by **Heimdall** » Thu Apr 10, 2008 5:44 pm

nevermind it was straightforward!

forums.ps2dev.org

VFPU math lib

VFPU math lib

what is the license ? BSD ?

libpspmath