The hunt for HV's FIFO/Push buffer...
Nouveau project leader answered my help request. Here is his reply :
Feel free to explore these dumps :
http://people.freedesktop.org/~kmeyer/renouveau_dumps/
Try to find the "test display_list" which uses index buffers
(I don't know these dumps, I can't give you more details)
Feel free to explore these dumps :
http://people.freedesktop.org/~kmeyer/renouveau_dumps/
Try to find the "test display_list" which uses index buffers
(I don't know these dumps, I can't give you more details)
Ok, these dumps use NV40TCL_VB_ELEMENT_U16 in the begin/end block. Yes, it is way to send indexed primitives to GPU. It is very bad idea to embed indices into your push buffer. Very bad idea.
Of course, lists work in that odd way.
It is better to make dumps from glDrawElements. Index buffers are first class citizens on NV40 class hardware.
Of course, lists work in that odd way.
It is better to make dumps from glDrawElements. Index buffers are first class citizens on NV40 class hardware.
Let's dream again, about... TILE...
For those who haven't take a close look inside pbkit source (plenty of comments there), here is the explanation of the TILE concept :
When you declare a TILE, you declare a memory area. Its most spectacular usage is for the depth stencil buffer. On nv20 you could declare 8 tiles. One tile has a massive internal GPU cache associated with.
Depth stencil buffer is something accessed for reading or writing very very often when many triangles are to be displayed at the same screen location. Usually, you HAVE TO clean depth stencil buffer at beginning of each frame (Z to max, stencil to 0), then draw from closest distance to farest distance, in order to take full advantage of automatic compression and data caching because of the TILE declaration.
On xb1, in pbkit Demo 04, one of the controller button allows to switch display to the depth stencil buffer so you can look at it. Triangles that will have same depth (more or less distance to camera), will have same colors (color=depth). But... If automatic compression is active you will only see maybe 1 pixel every 4 pixels horizontally and vertically. I.e you will see groups of 4x4 pixels and only the first pixel in top left corner of the group will be lit. That's automatic compression, by using smart coding, you can have 1:4, 1:8 or 1:16 compression rate. I.e GPU doesn't need to read/write more that 1, 2 or 4 dwords for each group of 4x4 pixels (16 dwords).
So... If you manage to keep an eye on the content of the depth stencil buffer and try to move it around in memory, maybe, with luck, you will see that automatic compression active. That would mean a previous program (a game?) has declared a tile but didn't trash it before quitting...
Ok, another naive dream... But since it has been reported that some traces were left in RAMIN after a game launch in game OS... Maybe...
Anyway that's for a 30% performance gain. Not absolutely necessary.
For those who haven't take a close look inside pbkit source (plenty of comments there), here is the explanation of the TILE concept :
When you declare a TILE, you declare a memory area. Its most spectacular usage is for the depth stencil buffer. On nv20 you could declare 8 tiles. One tile has a massive internal GPU cache associated with.
Depth stencil buffer is something accessed for reading or writing very very often when many triangles are to be displayed at the same screen location. Usually, you HAVE TO clean depth stencil buffer at beginning of each frame (Z to max, stencil to 0), then draw from closest distance to farest distance, in order to take full advantage of automatic compression and data caching because of the TILE declaration.
On xb1, in pbkit Demo 04, one of the controller button allows to switch display to the depth stencil buffer so you can look at it. Triangles that will have same depth (more or less distance to camera), will have same colors (color=depth). But... If automatic compression is active you will only see maybe 1 pixel every 4 pixels horizontally and vertically. I.e you will see groups of 4x4 pixels and only the first pixel in top left corner of the group will be lit. That's automatic compression, by using smart coding, you can have 1:4, 1:8 or 1:16 compression rate. I.e GPU doesn't need to read/write more that 1, 2 or 4 dwords for each group of 4x4 pixels (16 dwords).
So... If you manage to keep an eye on the content of the depth stencil buffer and try to move it around in memory, maybe, with luck, you will see that automatic compression active. That would mean a previous program (a game?) has declared a tile but didn't trash it before quitting...
Ok, another naive dream... But since it has been reported that some traces were left in RAMIN after a game launch in game OS... Maybe...
Anyway that's for a 30% performance gain. Not absolutely necessary.
ps2devman, thanks for your help.
It is better to dig a bit hypervisor interfaces for TILE setup.
For example, it is Nvidia MMIO regs data base:
http://gitweb.freedesktop.org/?p=mesa/d ... veau_reg.h
Compare with http://wiki.ps2dev.org/ps3:hypervisor:lv1_gpu_attribute :
ret64 = lv1_gpu_attribute(0x100, 0x007, val, 0, 0);
It is interrupt handler setup. Here 0x100 is definitely MMIO register index.
It is better to dig a bit hypervisor interfaces for TILE setup.
For example, it is Nvidia MMIO regs data base:
http://gitweb.freedesktop.org/?p=mesa/d ... veau_reg.h
Compare with http://wiki.ps2dev.org/ps3:hypervisor:lv1_gpu_attribute :
ret64 = lv1_gpu_attribute(0x100, 0x007, val, 0, 0);
It is interrupt handler setup. Here 0x100 is definitely MMIO register index.
Probably it is worth to use parameters for lv1_gpu_memory_allocate. parameter 0 is just memory size
parameter 1 is amount of some resource, up to 0x80000.
parameter 2 is amount of some resource, up to 0x300000.
parameter 3 is amount of some resource, up to 0xf //tiles?
parameter 4 is amount of some resource, up to 0x8
Seems like ZCOMP and TILE definitions. It it great if somebody is able to test these parameters and to note any side effects.
parameter 1 is amount of some resource, up to 0x80000.
parameter 2 is amount of some resource, up to 0x300000.
parameter 3 is amount of some resource, up to 0xf //tiles?
parameter 4 is amount of some resource, up to 0x8
Seems like ZCOMP and TILE definitions. It it great if somebody is able to test these parameters and to note any side effects.
FIFO workaround with firmware 2.0.0
I just wanted to confirm that the FIFO workaround (and Xv acceleration) is still valid with firmware 2.0.0
ps3rsx task list
Ok, I want development to be public. There are many tasks to do. I want to divide work in small parts, easy and fun to do.
The first task is DXT texture support. DXT compression can be handled by open source library like http://www.sjbrown.co.uk/?code=squish
I commited file textures.h with simple interface. Anybody is welcome to implement this interface. Implementation ( with your copyrights ) will be placed in the repository. After that you will be granted with write repo access.
If you want to contribute - email me. Feel free.
Peter.
The first task is DXT texture support. DXT compression can be handled by open source library like http://www.sjbrown.co.uk/?code=squish
I commited file textures.h with simple interface. Anybody is welcome to implement this interface. Implementation ( with your copyrights ) will be placed in the repository. After that you will be granted with write repo access.
If you want to contribute - email me. Feel free.
Peter.
-
- Posts: 3
- Joined: Thu Nov 08, 2007 11:39 pm
The first milestone is working low-level API. This API will work in the exclusive fullscreen mode. But this API will be full featured and will run in user mode.
With textures, buffers, sync with RSX it will take ~1 month of development.
Shader compiler also will take ~1 month.
It is possible to make gl-like interfaces for this low-level console-style library.
MesaGL porting is more complicated. The main problem is resource management. Many months to develop and debug... Also many months to support old-style T&L pipeline. We can disscuss Mesa porting only after the first milestone.
With textures, buffers, sync with RSX it will take ~1 month of development.
Shader compiler also will take ~1 month.
It is possible to make gl-like interfaces for this low-level console-style library.
MesaGL porting is more complicated. The main problem is resource management. Many months to develop and debug... Also many months to support old-style T&L pipeline. We can disscuss Mesa porting only after the first milestone.
@IronPeter
About mesa you might want to contact Ian Romanick. He made a announcement a few months back to port mesa to cell, although I don't know the status at this moment.
http://www.nabble.com/Mesa-on-Cell-plan-t4202805.html
About mesa you might want to contact Ian Romanick. He made a announcement a few months back to port mesa to cell, although I don't know the status at this moment.
http://www.nabble.com/Mesa-on-Cell-plan-t4202805.html
shader tokens
This repo contains some basic shader compiler.
http://gitweb.freedesktop.org/?p=mesa/m ... ri/nouveau
Nouveau project has many branches. Probably, other branches are more adequate. Refer "user section" at http://gitweb.freedesktop.org/
Development is relative easy because binary layer is very close to assembler:
http://www.opengl.org/registry/specs/NV ... rogram.txt
http://www.opengl.org/registry/specs/NV ... rogram.txt
Guys, why everybody wants to write shader compiler :)? Write some basic stuff like DXT textures support as your first task.
If you want to write shader compiler - write a small working demo with basic shader assembling.
http://gitweb.freedesktop.org/?p=mesa/m ... ri/nouveau
Nouveau project has many branches. Probably, other branches are more adequate. Refer "user section" at http://gitweb.freedesktop.org/
Development is relative easy because binary layer is very close to assembler:
http://www.opengl.org/registry/specs/NV ... rogram.txt
http://www.opengl.org/registry/specs/NV ... rogram.txt
Guys, why everybody wants to write shader compiler :)? Write some basic stuff like DXT textures support as your first task.
If you want to write shader compiler - write a small working demo with basic shader assembling.
index buffers
With new nouveau dumps ( thanks to marcheu ) i was able to use index buffers on RSX.
Check SVN, the triangle demo.
Check SVN, the triangle demo.
Re: shader tokens
You can even write some complex shader and output them to arb program.IronPeter wrote: If you want to write shader compiler - write a small working demo with basic shader assembling.
Though I never really used it, the nvidia sdk cg toolkit contains some support for various input and output. I don't know how to get from the
arb code to the machine code. It is far away from my knoledge.
Code: Select all
usage: cgc [-quiet] [-nocode] [-nostdlib] [-[no]fx] [-longprogs] [-v] [-strict] [-oglsl]
[-glslWerror] [-Dmacro[=value]] [-Iinclude_dir] [-profile id]
[-entry id | -noentry] [-profileopts opt1,opt2,...] [-o ofile] [-l lfile]
[-[no]fastmath] [-[no]fastprecision] [-bestprecision]
[-unroll (all|none|count=N)] [-ifcvt (all|none|count=N)]
[-inline (all|none|count=N)]
[-type <type definition>} [-typefile <file>} [-M<...>]
{file.cg}
supported profiles and their supported profileopts:
glslf profileopts:
glslv profileopts:
ps_1_3 profileopts:
MaxPixelShaderValue=<val>
ps_1_2 profileopts:
MaxPixelShaderValue=<val>
ps_1_1 profileopts:
MaxPixelShaderValue=<val>
dx8ps profileopts:
MaxPixelShaderValue=<val>
fp20 profileopts:
generic profileopts:
ps_3_0 profileopts:
fp40unlimited profileopts:
fp40 profileopts:
NumTemps=<val>
NumInstructionSlots=<val>
OutColorPrec=<val>
MaxLocalParams=<val>
vs_3_0 profileopts:
MaxLocalParams=<n>
MaxInstructions=<n>
vp40 profileopts:
NumTemps=<val>
NumInstructionSlots=<val>
MaxLocalParams=<val>
arbfp1 profileopts:
NumTemps=<val>
NumInstructionSlots=<val>
NoDependentReadLimit=<val>
NumTexInstructionSlots=<val>
NumMathInstructionSlots=<val>
MaxTexIndirections=<val>
MaxDrawBuffers=<val>
MaxLocalParams=<val>
ps_2_x profileopts:
NumTemps=<val>
NumInstructionSlots=<val>
Predication=<val>
ArbitrarySwizzle=<val>
GradientInstructions=<val>
NoDependentReadLimit=<val>
NoTexInstructionLimit=<val>
ps_2_0 profileopts:
dx9ps2 profileopts:
fp30unlimited profileopts:
fp30 profileopts:
NumInstructionSlots=<val>
NumTemps=<val>
vs_2_x profileopts:
DynamicFlowControlDepth=<0 or 24>
NumTemps=<12 to 32>
MaxLocalParams=<n>
vs_2_0 profileopts:
MaxLocalParams=<n>
dxvs2 profileopts:
MaxLocalParams=<n>
arbvp1 profileopts:
NumTemps=<12 to 32>
MaxInstructions=<n>
MaxLocalParams=<n>
vs_1_1 profileopts:
dcls
MaxLocalParams=<n>
dx8vs profileopts:
dcls
MaxLocalParams=<n>
vp20 profileopts:
vp30 profileopts:
Code: Select all
attribute vec4 testattrib;
void
main ( void ) {
gl_Position = ftransform ( );
gl_FrontColor = testattrib;
return;
}
Code: Select all
cgc -oglsl filename
Code: Select all
vattrib.vert
18 lines, 0 errors.
vs_1_1
// cgc version 1.5.0014, build date Sep 18 2006 21:56:59
// command line args: -oglsl
// source file: vattrib.vert
//vendor NVIDIA Corporation
//version 1.5.0.14
//profile vs_1_1
//program main
//semantic gl_ModelViewProjectionMatrixTranspose : STATE.MATRIX.MVP
//var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
//var float4 gl_Vertex : $vin.POSITION : ATTR0 : -1 : 1
//var float4 gl_FrontColor : $vout.COLOR0 : COL0 : -1 : 1
//var float4x4 gl_ModelViewProjectionMatrixTranspose : STATE.MATRIX.MVP : c[0], 4 : -1 : 1
//var float4 testattrib : $vin.ATTR1 : ATTR1 : -1 : 1
mov oD0, v1
dp4 oPos.w, v0, c3
dp4 oPos.z, v0, c2
dp4 oPos.y, v0, c1
dp4 oPos.x, v0, c0
@+
dom
dom
Grats on the index buffer breakthrough!
Can't help now because I'm lacking free time, but I will just describe how native shader assembly could be done for nv2A, for xbox1 :
- First the shader model must be identified. For nv2A it's SM 1.1
- Cgc.exe (from NVidia SDK 9.5) translates high level text language into low level assembly text language
- vsa.exe and psa.exe (from earlier Nvidia SDK or DirectX SDK) translate low level assembly text language into binary pseudo code (standard DirectX8 pseudo code)
- in pbkit.c, function pcode2mcode translates pseudo code into native code
(done by studying a lots of binary samples and comparing binary native code and matching pseudo code)
Nouveau stuff study is probably a good way to start. Can't help more for now. If something public, similar to vsa.exe and psa.exe could be found, it may do all the registers optimizations for us. Then the pseudo to native translation should be simple, assuming native code encoding is understood. I'm not experienced with DirectX9 yet, but there might be tools already available (may be more recent versions of vsa.exe and psa.exe, I don't know). Do we assume we are targetting SM 3.0?
Can't help now because I'm lacking free time, but I will just describe how native shader assembly could be done for nv2A, for xbox1 :
- First the shader model must be identified. For nv2A it's SM 1.1
- Cgc.exe (from NVidia SDK 9.5) translates high level text language into low level assembly text language
- vsa.exe and psa.exe (from earlier Nvidia SDK or DirectX SDK) translate low level assembly text language into binary pseudo code (standard DirectX8 pseudo code)
- in pbkit.c, function pcode2mcode translates pseudo code into native code
(done by studying a lots of binary samples and comparing binary native code and matching pseudo code)
Nouveau stuff study is probably a good way to start. Can't help more for now. If something public, similar to vsa.exe and psa.exe could be found, it may do all the registers optimizations for us. Then the pseudo to native translation should be simple, assuming native code encoding is understood. I'm not experienced with DirectX9 yet, but there might be tools already available (may be more recent versions of vsa.exe and psa.exe, I don't know). Do we assume we are targetting SM 3.0?
ps2devman, our target is NV_fragment_program, not SM3.0. NV_fragment_program is very close to hardware and has many unique features ( like pack-unpack ).
We must avoid some PS3.0 core features, such as dynamic branches. This branching is very slow on NV40 class hardware ( I have large PC experience with that ).
We must avoid some PS3.0 core features, such as dynamic branches. This branching is very slow on NV40 class hardware ( I have large PC experience with that ).
xorg driver with blending
Hi,
For those interested, I've managed to get some time to update my experimental xorg based on IronPeter's and the nouveau team work. It now supports a lot more Composite operations, including alpha blending, through the 3D engine. That means accelerated translucent windows, and it works with Xv too (so you can have accelerated translucent video over your desktop, with windows dropping shadows, etc...).
Still, there are some nasty artifacts on standard rendering (e.g. moving a standard window around without xcompmgr running will lead to serious artifacts) and solid fills are not accelerated, so it is hardly usable for every day use. Moreover, the code is a big patchwork and needs a lot of cleanup. I now plan to accelerate solid fills with the 3D engine too, and get rid of the remaining artifacts. This experimental driver is only proof-of-concept, to check we have everything we need for accelerated X on PS3. Once the driver is functionnal (usable for every day use), I plan to find a way to merge back with nouveau, probably by writing a drm driver.
Code is available here:
http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git
IronPeter, concerning the 3D side, did you check Gallium?
http://www.tungstengraphics.com/wiki/in ... /Gallium3D
I think writing a driver for it shares some common goals with libps3rsx. In particular, it assumes availability of pixel and vertex shaders, and is supposed to be independent from OpenGL.
Final note: I'm using firmware 2.0 now.
For those interested, I've managed to get some time to update my experimental xorg based on IronPeter's and the nouveau team work. It now supports a lot more Composite operations, including alpha blending, through the 3D engine. That means accelerated translucent windows, and it works with Xv too (so you can have accelerated translucent video over your desktop, with windows dropping shadows, etc...).
Still, there are some nasty artifacts on standard rendering (e.g. moving a standard window around without xcompmgr running will lead to serious artifacts) and solid fills are not accelerated, so it is hardly usable for every day use. Moreover, the code is a big patchwork and needs a lot of cleanup. I now plan to accelerate solid fills with the 3D engine too, and get rid of the remaining artifacts. This experimental driver is only proof-of-concept, to check we have everything we need for accelerated X on PS3. Once the driver is functionnal (usable for every day use), I plan to find a way to merge back with nouveau, probably by writing a drm driver.
Code is available here:
http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git
IronPeter, concerning the 3D side, did you check Gallium?
http://www.tungstengraphics.com/wiki/in ... /Gallium3D
I think writing a driver for it shares some common goals with libps3rsx. In particular, it assumes availability of pixel and vertex shaders, and is supposed to be independent from OpenGL.
Final note: I'm using firmware 2.0 now.
Glaurung, i'll check Gallium, thanks. At first look it is ugly abstraction layer.
Have you some ideas about this topic:
http://forums.ps2dev.org/viewtopic.php?t=9317 ?
Have you some ideas about this topic:
http://forums.ps2dev.org/viewtopic.php?t=9317 ?
just to add the the l33t b33f l33t cod3 a probable meaning , i found this reference googling:
http://www.artima.com/insidejvm/whyCAFEBABE.html
Feel free to remove the post , if you think it's not in topic.
ciao
gigi
http://www.artima.com/insidejvm/whyCAFEBABE.html
Feel free to remove the post , if you think it's not in topic.
ciao
gigi
Glaurung
I finished my spu solid (coming in spu medialib) yesterday (last bug gone i think) it has no alignement restrictions, tho its probably useless for you.
Going to work at a copy today so allthough not "gpu" it will hopefully serve as a more permanent SPU driver for non GPU cell / spu sollutions
After i finish copy i would like to start a X driver from scratch using these functions If anyone wants to assist any help is appriciated.
cheers
I finished my spu solid (coming in spu medialib) yesterday (last bug gone i think) it has no alignement restrictions, tho its probably useless for you.
Going to work at a copy today so allthough not "gpu" it will hopefully serve as a more permanent SPU driver for non GPU cell / spu sollutions
After i finish copy i would like to start a X driver from scratch using these functions If anyone wants to assist any help is appriciated.
cheers
Don't do it alone.