The hunt for HV's FIFO/Push buffer...

Technical discussion on the newly released and hard to find PS3.

Moderators: cheriff, emoon

Post Reply
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Grats!
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

my plans.

Post by IronPeter »

I would like to wait few days. Probably, I'll describe 3D initialization on wiki.

After that ( and getting stable cvn hosting ) I want to port some parts of mesa-nouveau ( fragment shader compiler, states setup, textures, buffers ). I want to write some kind of small gl library. Do not want to write memory and resource managers, this library will work in the exclusive mode. I want fragment shader compiler and DXT texture compressor to be standalone utils, not the core of the library.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

I edited Wiki, please check grammar.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

black rendering issue

Post by IronPeter »

I was reported about only black rendering on PS3 ( also nouveau has the same issue on PPC, probably, endianness ).

Will try to fix that. It is stopper bug, of course.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

It is very funny bug, I tame it a bit.

using this shader ( gray output )

static nv_pshader_t nv30_fp = {
.num_regs = 2,
.size = (2*4),
.data = {
/* MOV R0, ( 0.5f, 0.5f, 0.5f, 0.5f ) */
0x01403e81, 0x1c9dc802, 0x0001c800, 0x3fe1c800,
0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000,
}
};
and with endiannes of 3D class setted to 0x0 I was able to get non-black ( gray, as aspected ) rendering. Check svn repo.

The problem is RGBA is converting visually as ABGR on the screen ( probably there is some workaround with that ). Very funny bug.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

IronPeter wrote:The problem is RGBA is converting visually as ABGR on the screen ( probably there is some workaround with that ). Very funny bug.
From reading on the web it seems that most nVidia cards can be driven as either little or big endian. A random dump I found by googling for endianness on nVidia cards seems to suggest that endianness can be specified on a per-object basis for any object that has ENGINE=GRAPHICS in the context: http://people.freedesktop.org/~kmeyer/r ... put.txt.gz

Perhaps the context the hypervisor has created is little endian for some reason?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

>Perhaps the context the hypervisor has created is little endian for some reason?

Context objects are big endian ( native for PPC ). For example, screen data blitted by the GPU is big endian.

And as usual, edit Wiki after me, please :).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

endian issue resolved

Post by IronPeter »

check svn repo and Wiki page.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Thx for finding the solution. Keep on the good work!
cypherpunks
Posts: 4
Joined: Mon Oct 22, 2007 7:13 am

Post by cypherpunks »

I made some grammar edits to the wiki. There were some pieces that were unclear to me however, and could use some clarification:

IronPeter, did you create a 3d class from scratch? This is what I understand from this thread and the source code. However the way it is worded on the wiki leads me to believe you found an existing 3d object, or modified an existing object to be a 3d object. Which statement is correct?

Also, the wiki says the Hypervisor makes objects in RAMIN. Is it possible to make objects anywhere other than RAMIN, or is this just a convention of the Hypervisor? I admit ignorance here as to how NVIDIA cards work.

Finally, in the FIFO workaround section where it says "So the hack consists in either patching the last operation with a NOP, or changing the FIFO write pointer to stop earlier." -- shouldn't it be changed to "changing the FIFO write pointer to stop later" since you'll (presumably) be adding commands to the end of the FIFO?

Thanks for all the hard work! I hope to contribute soon as well (once I get ps3toolchain to compile under cygwin.. grrr..)
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

>IronPeter, did you create a 3d class from scratch

Yes, I created 3d class instance from scratch. There is no 3d class instance registered by HV in RAMHT. Probably, there is some HV's call doing that. We did not find that call.

> Is it possible to make objects anywhere other than RAMIN, or is this just a convention of the Hypervisor?

RAMIN is acronym for "the place there graphics objects are stored". This memory has strict format. This format seems to be independent from environment ( HV environment on PS3 or video driver on PC ).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

>I hope to contribute soon as well

I hope I'll setup OpenRSX project soon. Here, on ps2dev svn. Everybody wellcome to contribute.

Want Glaurung to contribute textures and blend modes :).
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

cypherpunks wrote:Finally, in the FIFO workaround section where it says "So the hack consists in either patching the last operation with a NOP, or changing the FIFO write pointer to stop earlier." -- shouldn't it be changed to "changing the FIFO write pointer to stop later" since you'll (presumably) be adding commands to the end of the FIFO?
No, it is correct as is.

Think of it as:

Code: Select all

rptr: opA
      opB
      opC
x:    END //stop processing, wait for hypervisor to restart GPU
eptr:
The GPU processes instructions sequentially from rptr (think PC in a CPU) from an instruction queue. Unlike a CPU, when rptr==eptr it waits until eptr changes so that more instructions can be added to the end of a list of instructions. The benefit of this is that it decouples the GPU execution flow from the rate of instruction generation on the CPU. With a conventional CPU, you'd need to create blocks of instructions with an END and only once the entire block had been created could you let the GPU execute it. That method would require the CPU to keep polling the GPU to figure out what it's up to, this approach allows the CPU to create instructions and assume they will get done soon.

So, now you understand that, the hack is the prevent execution of the END instruction, because restarting the GPU is a privileged hypervisor operation. Either of the two techniques have the same effect (either setting wptr to x or replacing the END with a NOP) - the GPU never executes the END instruction and so continues to wait for the wptr to change again before it continues reading instructions. By never ending the list of instructions, the GPU is always waiting for us and so we never need the hypervisor to kick start the process again.

To add our own instructions to the command queue, we formulate a block of instructions, writing them to the next available position in the queue. Then, we update wptr to point to the next instruction after the last one. This means the GPU notices it can continue processing and executes up to the last one.

The next question is that what happens when we get to the end of this buffer. After all, it's only 64k long. The answer is that the GPU has a JMP/branch instruction just like a CPU. So, when we're close to running out of space, we jump back to the start of the FIFO buffer and repeat the process. There are GPU pre-fetch bugs that mean you need to target the jump into a block of NOPs, but this isn't a major problem.

Hope that helps you make sense of this!
Warren
Posts: 175
Joined: Sat Jan 24, 2004 8:26 am
Location: San Diego, CA

Post by Warren »

Great post ralferoo! Thanks for the overview.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

There is, technically, a way to branch to sub fifo buffers (of any size).
But that implies to know what services the interrupt handler associated to the GPU interrupt can do. Once again, I will describe what happens with nv2A on xbox 1.

In the main fifo sequence you just put a fire interrupt command with a data code that says "please remember this address+n (return address)".
Then you put a jump command after that, to the sub fifo buffer (usually already filled with pre-calculated insanely long commands to achieve top speed). At the end of the sub fifo buffers (let's consider it as a kind of subprogram or procedure), you put a fire interupt command with a data code that says "return" followed by a jump (which address will be updated). So the interrupt handler has 2 data code that allows to remember the return address or to setup up the jump adress with the remembered return address (in order to effectively do the "return").

Plenty of magnificent tricks (like the "fences" mechanism, a kind of sync between CPU and GPU for many specific purposes) can be done once you know (or better, can edit) the interrupt handler. Not the case on PS3 of course... But maybe someday, someone will be able to just "read" the HV code and will tell us what services already offers the current interrupt handler. Useless to say it's also very useful to be able to understand error reports made by GPU (reported through the same interrupt handler) when you make a mistake when you write a wrong sequence of command somewhere in fifo buffer (better than a black screen saying nothing).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

small update

Post by IronPeter »

textured triangle in the repo. Just 10 minutes work.
cypherpunks
Posts: 4
Joined: Mon Oct 22, 2007 7:13 am

Post by cypherpunks »

Thanks for the detailed explanation ralferoo! It makes perfect sense now. If you don't mind, later today I'll expand that section in the wiki to include parts of this discussion, and maybe an example FIFO queue to illustrate it.

PS: IronPeter, that's great news!
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

cypherpunks, resolution ot the endian issue was great ( nouveau did not handle big endianness ).

Proper texturing is just an expected behavior.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

another small feature

Post by IronPeter »

Setup of depth buffer. Just for fun this buffer is mapped into visual screen area.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Fantastic! You did it! Shaders running on PS3!!! Thx IronPeter!
Time to adapt this for other os demo... (come here free time!)
PS3 owners who love coding will have wonderful Xmas holidays!

(beware to next firmware upgrade though...)
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

ps2devman, thanks a lot.

>Shaders running on PS3!!!

also there are textures and working depth test :). I've updated demo, now it shows 3 Z-overlapping triangles.

So at this moment we have

1.) working shaders, both pixel and vertex
2.) working textures
3.) working Z test.

We need

1.) Renderstates like blend, different Z modes, alpha test. - easy
2.) Index and vertex buffers. - a bit more harder.
3.) Texture support with differenet formats, mips, swizzling. - more harder
4.) Some shader compiler ( microcode is very hard to maintain ). - hard.

I want to setup ps2rsx project soon. Probably, this week-end,
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

I will try to give a hand with 4).
The idea is to do something similar to function pcode2mcode in pbkit.
It translates standard shaders written in pseudo code into native code.
That way public compiler Cg.exe could be used and you just include the binary (pseudo code) result in your code, then function pcode2mcode does all the translation work automatically.

Dunno if Cg.exe is good enough for the level of shader models we need.

Also need to look again all the files of Nouveau project... They surely ran into this trouble already and they may have found best solution already.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

I would like to talk about performance (because we talk about shaders).

There are persistent rumours that claim that the 360 Xenos is faster than RSX. Of course, now, we are getting closer to the answer, since we will be able to count the number of vertex per frame we can enqueue (when vertex buffer mechanism will work).

Thanks to tmbinc, we could see that, currently, homebrew on 360 can expect, at least, 3.900.000 v/f at 60 fps with minimal shader (no lighting, just simple texture projection) and 3.100.000 v/f at 60 fps with gouraud lighting (1 source). The same kind of performance loss has been seen with other gpu's even if they are slower (xb1 -330.000-, ps2 -250.000-).

I think the goal, would be to have higher performance on PS3, since the machine costs more. If RSX alone is slower, then we have to use SPU's to get a smart solution. Actually the clue, I think, are the shaders. The more sophisticated they are, the more we lose performance.
So, since we are to think about compiling shaders, here is a paradox :
Maybe we shouldn't spend too much time working on sophisticated shaders. What we may try to do is to have SPU's do the preliminary calculation work and data flow towards shaders, in order to have minimal and fastest shaders running on the RSX... It's certainly unusual strategy (but remembers me vu1 working for GS on ps2).
rapso
Posts: 140
Joined: Mon Mar 28, 2005 6:35 am

Post by rapso »

As long as you don't really know the performance charasteristics, it's no good to already make assumption on ways to improve performance. what if the basic performance on PS3 is bad, but it loses less performance on more sophisticated shader?
I think that even with homebrew going on it will take quite a long time to figure out how the performance really is and why.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

There is official geometry proccessing tool Edge from Sony. You can google for Edge specs ( it is open info ).

This tool does vertex processing on SPU. Skeletal animation, even back face culling. I tried to write some back face culling code. Pretty fast on the single SPU. One single SPU can provide RSX with geometry. Two SPUs can flood the graphic chip.

RSX has two memory channels - DDR and XDR. XDR memory contains push buffer and is good for dynamic spu-generated geometry. DDR memory is for render targets and textures.

The main perfomance ( just perfomance, not the core functionality ) issue, as for me, is TILE and ZCOMP setup. You can refer pbkit ( thanks to ps2devman ) or nouveau project for details.

I do not know the way to setup it from FIFO interface. pbkit and Nouveau do this setup via mmio regs. I have no ideas how can we access global GPU mmio regs.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Don't worry we will find a way, even if it takes months to find it.

If we can have vertex buffers running, that will be already heaven.

I have the feeling the interrupt handler is the key. Since it's used to report gpu errors, I don't think it doesn't exist in HV, and I'm pretty sure, Sony engineers were lazy to strip unused/dangerous services from it.
We can try to observe existing shaders for any unusual command that would be an access request to mmio from fifo. On nv2A (xb1) it's used to disable/enable the noise flag for compressed texture, for example, right from within the command sequence in the fifo (push buffer).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

ps2devman, yes. The TILE and ZCOMP setup is not critical task for us.

I have working vertex buffers. With some issues.

The problem is index buffer. I was unable to find any info about index buffer in the nouveau docs. I have not PC with NV40 and linux installed to make fifo dump with gl
DrawElements call :(.

It is great if somebody can do that dump.

Edit: Vertex buffer works fine, both in XDR or DDR memory, issues were resolved.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

I'm speechless... It's heaven. Thanks a lot IronPeter!

For index buffer, I don't know well enough nv40 yet, but you can see how it is done in nv20 by looking at pbkit Demo 04. There are constants in the source to define in order to have rendering by index buffer instead of vertex buffer. Also by looking at the name of the nv20 constant, you may discover how will be named the nv40 constant that does the same.

However, for me, it's heaven... Since I plan to have same homebrew game sources compile for ps3, 360, xb1 and ps2. Since ps2 doesn't support at all index buffers, I planned to use vertex buffers only anyway.

Anyway, I will send an e-mail to Nouveau project leader, to be sure he knows what point you reached. He should be able to give us nice infos.
(And Nouveau project members often have nv40 card and Linux dumper)
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

SVN repo on ps2dev

Post by IronPeter »

I've created ps3rsx project. Excuse some delay.

For now there is only one project inside this repo - a bit modified example with 3 triangles. z buffering, textures, vertex and pixel processing, vertex buffers.

I want to have full-scale 3D library. src folder is empty for now :).

project will have MIT license.

SVN repo:
http://svn.ps2dev.org/listing.php?repna ... rev=0&sc=0

fill free to commit.
ngharo
Posts: 4
Joined: Fri Oct 26, 2007 3:00 am

Post by ngharo »

IronPeter wrote:The problem is index buffer. I was unable to find any info about index buffer in the nouveau docs. I have not PC with NV40 and linux installed to make fifo dump with gl
DrawElements call :(.

It is great if somebody can do that dump.
I have a friend with a GeForce 6800 that I could borrow to complete a FIFO dump. Only, I don't know how :( Let me know if I can help.
Post Reply