VFPU math lib

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

VFPU math lib

Post by MrMr[iCE] »

I've been working on a general purpose math lib to replace most of libm's implementations of major math functions. I'm almost done with it but here's a taste of what to expect, in cycle times:

Code: Select all

let v  = 0.4f

- sinf(v) = 0.389418, cycles: 856
- vfpu_sinf(v) = 0.389418, cycles: 160
 
- cosf(v) = 0.921061, cycles: 990
- vfpu_cosf(v) = 0.921061, cycles: 154
 
- tanf(v) = 0.422793, cycles: 1632
- vfpu_tanf(v) = 0.422793, cycles: 121
 
- asinf(v) = 0.411517, cycles: 1265
- vfpu_asinf(v) = 0.411517, cycles: 154
 
- acosf(v) = 1.159279, cycles: 1433
- vfpu_acosf(v) = 1.159280, cycles: 107
 
- atanf(v) = 0.380506, cycles: 692
- vfpu_atanf(v) = 0.380506, cycles: 126
 
- sinhf(v) = 0.410752, cycles: 1898
- vfpu_sinhf(v) = 0.410752, cycles: 356
 
- coshf(v) = 1.081072, cycles: 1885
- vfpu_coshf(v) = 1.081072, cycles: 246
 
- tanhf(v) = 0.379949, cycles: 1525
- vfpu_tanhf(v) = 0.379949, cycles: 208
 
- expf(v) = 1.491825, cycles: 1351
- vfpu_expf(v) = 1.491824, cycles: 212
 
- logf(v) = -0.916291, cycles: 1409
- vfpu_logf(v) = -0.916292, cycles: 210
 
- fabsf(v) = 0.400000, cycles: 7
- vfpu_fabsf(v) = 0.400000, cycles: 93
 
- sqrtf(v) = 0.632456, cycles: 40
- vfpu_sqrtf(v) = 0.632455, cycles: 240
 
- powf(v, v) = 0.693145, cycles: 3488
- vfpu_powf(v, v) = 0.693145, cycles: 412
The only clear losers in my lib are fabsf and sqrtf, since the floating point ops on the allegrex already do those functions..I'll prolly axe those, let the allegrex deal with those since it seems to do well in that area.

My lib will have other functions not in libm though, some related to vector operations (like vfpu_normalize_vector), some matrix ops (like vfpu_perspective_matrix, libpspgum_vfpu didnt have a proper asm version at the time), and some quaternion math as well. Stay tuned as I make progress.

As I said, this is a work in progress, as such, dont expect much help from me using it =) Here's the link, you should know what do with the files: http://mrmrice.fx-world.org/libpspmath.zip
Tinnus
Posts: 67
Joined: Sat Jul 29, 2006 1:12 am

Post by Tinnus »

I think it should be useful to add functions like sinfd, sinft and sinfq (for all functions) to speedup calculations in the case the user wants/can calculate 2, 3 or 4 values at the same time. Should be faster than calling the function multiple times :)

I know the main purpose is to just replace libm, but why not enable a little (more) optimization as well? :)
Let's see what the PSP reserves... well, I'd say anything is better than Palm OS.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

well if you were on #pspdev, you should probably know that to enable more optimization you need a gcc "aware" of vfpu registers (single or vector ones). So such optimizations for sinfd/t/q would not really be efficient and very cumbersome to use.
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

v2 is up for grabs

http://mrmrice.fx-world.org/files/libpspmath_v2.zip

This one adds quite a few quaternion functions, check pspmath.h for the full list. atan2f is available as well.

to install:

Code: Select all

make
make install
make install will copy the lib and header files into the proper sdk folders, so just do #include <pspmath.h> and add -lpspmath to your LIBS line in your makefiles
cools
Posts: 46
Joined: Sat Mar 04, 2006 12:57 pm

Post by cools »

This is pretty cool! It can make using the vfpu a lot easier for people who dont know how to use the vfpu...
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

v3 is up now

http://mrmrice.fx-world.org/files/libpspmath_v3.zip

This adds one new function: vfpu_ease_in_out. This function is mainly for animation control, when you want to smoothly interpolate between 2 points with an accleration and deceleration curve.

Ive included a sample in the zip that shows how to use the quaternion functions, including how to interpolate between 2 quaternions, how to make a random 3d starfield and some other features. like the vfpu random number generator.

enjoy =)
Bytrix
Posts: 72
Joined: Wed Sep 14, 2005 7:26 pm
Location: England

Post by Bytrix »

Great work MrMr[iCE]. It'll be interesting to see how these functions speed up my games. Especially in collision detection.
pegasus2000
Posts: 160
Joined: Wed Jul 12, 2006 7:09 am

what is the license ? BSD ?

Post by pegasus2000 »

MrMr[iCE] wrote:v3 is up now

http://mrmrice.fx-world.org/files/libpspmath_v3.zip

This adds one new function: vfpu_ease_in_out. This function is mainly for animation control, when you want to smoothly interpolate between 2 points with an accleration and deceleration curve.

Ive included a sample in the zip that shows how to use the quaternion functions, including how to interpolate between 2 quaternions, how to make a random 3d starfield and some other features. like the vfpu random number generator.

enjoy =)
What is the license ? BSD ?
Xfacter
Posts: 9
Joined: Wed Feb 28, 2007 10:13 am

Post by Xfacter »

Great lib, but there's a problem with vfpu_atan2f. An x and y of zero will cause a crash (divide by zero). Most libraries I've seen return 0 when this is encountered, easy fix.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Xfacter wrote:Great lib, but there's a problem with vfpu_atan2f. An x and y of zero will cause a crash (divide by zero). Most libraries I've seen return 0 when this is encountered, easy fix.
add something like "vcmp.s NS, S000; vcmovt.s S000, S000[0], 0" at the end of the function, and any situation where a NaN would occur would turn the result in 0.0. You may need to adapt S000 to the right vfpu register being used as a result in this function.
KickinAezz
Posts: 328
Joined: Sun Jun 03, 2007 10:05 pm

Post by KickinAezz »

http://mrmrice.fx-world.org/files/libpspmath_v4.zip -<<< Is also available

Version 4!??
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

KickinAezz wrote:http://mrmrice.fx-world.org/files/libpspmath_v4.zip -<<< Is also available

Version 4!??
I have the libpspmath v4 given by MrMrIce the last time I "saw" him. So yes, the last version is 4.
Art
Posts: 642
Joined: Wed Nov 09, 2005 8:01 am

Post by Art »

This is a noob question, but if you define floats in your app, and perform
math operations on them, does that automatically mean that the FPU is used
to calculate them in the PSP?
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Yes. Else it would have to be done in software emulation (as IS done with doubles) and this would be hella slow.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
Art
Posts: 642
Joined: Wed Nov 09, 2005 8:01 am

Post by Art »

Thanks, I suspected so.
Xfacter
Posts: 9
Joined: Wed Feb 28, 2007 10:13 am

Post by Xfacter »

I don't know if you meant to do this or what, but vfpu_logf is actually natural log... Here's log:

Code: Select all

float vfpu_log&#40;float x&#41; &#123;
    float result;
    __asm__ volatile &#40;
        "mtv     %1, S000\n"
        "vcst.s  S001, VFPU_LOG2TEN\n"
        "vrcp.s  S001, S001\n"
        "vlog2.s S000, S000\n"
        "vmul.s  S000, S000, S001\n"
        "mfv     %0, S000\n"
        &#58; "=r"&#40;result&#41; &#58; "r"&#40;x&#41;&#41;;
    return result;
&#125;
Edit: Oops, don't mind me. Guess I don't know much about libm's naming scheme (generally when I see "log" I assume log10). Sorry!
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

DELETED
Last edited by hlide on Thu Aug 30, 2007 2:18 pm, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Xfacter wrote: Edit: Oops, don't mind me. Guess I don't know much about libm's naming scheme (generally when I see "log" I assume log10). Sorry!
log(x) = log10(x) = log2(x)/log2(10);

this is what the code is doing, so there's no problem here.
Xfacter
Posts: 9
Joined: Wed Feb 28, 2007 10:13 am

Post by Xfacter »

hlide wrote:
Xfacter wrote: Edit: Oops, don't mind me. Guess I don't know much about libm's naming scheme (generally when I see "log" I assume log10). Sorry!
log(x) = log10(x) = log2(x)/log2(10);

this is what the code is doing, so there's no problem here.
That's what the revised code I posted does, the one in the library is natural log.
headness13
Posts: 2
Joined: Sun Oct 21, 2007 6:11 pm

libpspmath

Post by headness13 »

very nice library, but there are a few functions missing, like:

rotate_matrixX(M44, angle)
rotate_matrixY(M44, angle)
rotate_matrixZ(M44, angle)
rotate_matrixXYZ(M44, anglex, angley, anglez)
multiply_matrix(M44, M44, M44)
vector_cross_multiply (for FVector3 and FVector4)

could you help me with this things ?! - i can make them in C code, but asm is so much faster - and speed is a ishue at what i-m doing
thanx if you decide to help me MrMr[iCE]
snowsquirrel
Posts: 51
Joined: Sun Feb 24, 2008 3:36 am

Post by snowsquirrel »

hello,

I am interested in the vector proecessing funciton in the lb. But I notice that they are v4's and not v3's. To use this with v3's, do I have to assign x,y,z in my v4 to xyz in my v3. Or do most people just use v4's for everything, ignoring the 'w' when only xyz are needed?


~S
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

well, as an example :

Code: Select all

void vfpu_add_vector&#40;ScePspFVector4 *vout, ScePspFVector4 *va, ScePspFVector4 *vb&#41; &#123;
   __asm__ volatile &#40;
       "lv.q    C000, %1\n"
       "lv.q    C010, %2\n"
       "vadd.t  C020, C000, C010\n"
       "sv.q    C020, %0\n"
       &#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;&#41;;
&#125;
vadd.t adds two 3D vectors, not 4D vectors

there is a reason why we use ScePspFVector4 instead of ScePspFVector3 : "lv.q ..."
vfpu can read a 4-component vector in one instruction instead of 4 single "lv.s"

but it cannot load or store a 2D or 3D vectors in memory.

but if you really want that bad a ScePspFVector3 instead, you need to convert it into :

Code: Select all

void vfpu_add_vector3&#40;ScePspFVector3 *vout, ScePspFVector3 *va, ScePspFVector3 *vb&#41; &#123;
   __asm__ volatile &#40;
       //"lv.q    C000, %1\n"
       "lv.s    S000, %4\n"
       "lv.s    S001, %5\n"
       "lv.s    S002, %6\n"

       //"lv.q    C010, %2\n"
       "lv.s    S010, %7\n"
       "lv.s    S011, %8\n"
       "lv.s    S012, %9\n"

       "vadd.t  C020, C000, C010\n"
       //"sv.q    C020, %0\n"
       "sv.s    S020, %0\n"
       "sv.s    S021, %1\n"
       "sv.s    S022, %2\n"
       &#58; "+m"&#40;vout->x&#41;, "+m"&#40;vout->y&#41;, "+m"&#40;vout->z&#41;
       &#58; "m"&#40;va->x&#41;, "m"&#40;va->y&#41;, "m"&#40;va->z&#41;,
         "m"&#40;vb->x&#41;, "m"&#40;vb->y&#41;, "m"&#40;vb->z&#41;&#41;;
&#125;
snowsquirrel
Posts: 51
Joined: Sun Feb 24, 2008 3:36 am

Post by snowsquirrel »

yes, I have looked at the code, but it really means nothing too me.

I suspected that the instructions were optimized for v4's not v3's. I think that is pretty common.

But I am not 100% sure what you are saying?

Code: Select all

"lv.q    C000, %1\n"
"lv.q    C010, %2\n"
loading va and vb into registers

Code: Select all

"vadd.t  C020, C000, C010\n"
add those two registers. you said vadd.t adds 3d vectors? so what is happening to 'w' here?

Code: Select all

"sv.q    C020, %0\n"
save result to vout.

Code: Select all

&#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;
what is this doing?

I assume if I used this exact code with v3's memory stomping would occur where it expected the last 4 bytes of the v4.

sorry to be daft, I have just never worked with assember much, other than some trivial stuff years ago.

~S
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

snowsquirrel wrote:yes, I have looked at the code, but it really means nothing too me.

I suspected that the instructions were optimized for v4's not v3's. I think that is pretty common.

But I am not 100% sure what you are saying?

Code: Select all

"lv.q    C000, %1\n"
"lv.q    C010, %2\n"
loading va and vb into registers

Code: Select all

"vadd.t  C020, C000, C010\n"
add those two registers. you said vadd.t adds 3d vectors? so what is happening to 'w' here?

Code: Select all

"sv.q    C020, %0\n"
save result to vout.

Code: Select all

&#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;
what is this doing?

I assume if I used this exact code with v3's memory stomping would occur where it expected the last 4 bytes of the v4.

sorry to be daft, I have just never worked with assember much, other than some trivial stuff years ago.

~S
vfpu instructions use suffixes like .s (single/scalar), .p (pair, 2D), .t(triple, 3D), .q(quad, 4D)

with .q, x, y, z and w are touched
with .t, only x, y and z are touched, w is left untouched
with .p, only x and y are touched, z and w are left untouched
with .s, you can access any element in a matrix

so any instructions with .t suffix would ignore your w component.

load and store operations :

lv = load vector
sv = store vector


lv.s/sv.s = load/store a scalar element in matrix (Smcr = element in matrix m, at column c, at row r).

lv.q/sv.q = load/store a 4D vector (Cmcr = column c vector in matrix m, starting from element at row r; Rmcr = row r vector in matrix m, starting from element at column c).

there is no lv.p/sv.p/lv.t/sv.t, so you need to split into lv.s/sv.s


inline_asm : 'asm' '(' asm_sequence_string [':' [ output_regs ] [ ':' [ input_regs ] [ ':' clobbered_regs ] ] ] ')' ';'

output_registers : output_register [ output_registers ] ;

input_registers : input_register [ input_registers ] ;

output_register :

"=m"(var) --> this a memory place just for output

"+m"(var) --> this a memory place for input and output

input_register :

"m"(var) --> this a memory place just for input


%0, %1, ... : if you have in this order : "=m"(*vout) : "m"(*va), "m"(*vb), gcc would substitute %0 with a machine address to access vout, %1 to access va and %2 to access vb. Seem them as macro parameters.
snowsquirrel
Posts: 51
Joined: Sun Feb 24, 2008 3:36 am

Post by snowsquirrel »

Ok I understand everything but this line:

Code: Select all

&#58; "+m"&#40;*vout&#41;&#58; "m"&#40;*va&#41;, "m"&#40;*vb&#41;"
So this line is mapping the value in registers to value in memory? If so, shouldn't this mapping be done at the beginning? Or does it take effect in the beginning?

So if my C code uses v3's, would I be better to manually add the 3 elements, or create temporary v4's with v3 values, and then use vfpu_vector_add function? Or use the 3x lv.s method?

Thanks,
~S
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

snowsquirrel wrote:So if my C code uses v3's, would I be better to manually add the 3 elements, or create temporary v4's with v3 values, and then use vfpu_vector_add function? Or use the 3x lv.s method?
~S
3xlv.s method is probably best for this case, so long as your vector only has 3 components.
snowsquirrel
Posts: 51
Joined: Sun Feb 24, 2008 3:36 am

Post by snowsquirrel »

Extended Asm. I get it now. Thanks.
~S
Heimdall
Posts: 245
Joined: Thu Nov 10, 2005 1:29 am
Location: Netherlands
Contact:

Post by Heimdall »

Hi guys,

From v3 to v4 I see that there was a change on the API and the demo on v3 doesn't build, does anyone know what changed from v3 to v4? (so i can be lazzy and don't diff the code ;))
Heimdall
Posts: 245
Joined: Thu Nov 10, 2005 1:29 am
Location: Netherlands
Contact:

Post by Heimdall »

nevermind it was straightforward!
Post Reply