The hunt for HV's FIFO/Push buffer...
-
- Posts: 4
- Joined: Mon Oct 22, 2007 7:13 am
I'm a native English speaker and am willing to edit the wiki. There is already a wiki at <http://wiki.ps2dev.org/> you can use. Just add whatever info you have, and I'll edit it for grammar and such, then integrate it into a cohesive document.
To everyone in this thread: THANK YOU! I'm a long time lurker interested in writing my own OS for the PS3. The lack of acceleration (2D or 3D) was the only thing which kept me from working on it. However your hard work as changed that, and I hope to start contributing soon!
To everyone in this thread: THANK YOU! I'm a long time lurker interested in writing my own OS for the PS3. The lack of acceleration (2D or 3D) was the only thing which kept me from working on it. However your hard work as changed that, and I hope to start contributing soon!
RAMIN tweaking.
Topic of this thread is now "the hunt for hypervisor function creating instance of CLASS_3D".
Looks like the hunt for black cat in the black room.
I tried to create instance of CLASS_3D via RAMIN writes. I did only three memory writes:
1.) offset in dwords 0x64cb8, value 0xfeed0003 ( RAMHT key )
2.) offset in dwords 0x64cb9, value 0x00105020 ( RAMHT value )
3.) offset in dwords 0x1d020 * 4, value 0x00004097 ( instance data, class 3D )
the newly created object was attached to subchannel 0. 2D blits work after that insertion.
The only problem is hypervisor. It is very unhappy with my CLASS_3D and cannot make regular screen blits after ioctl( PS3FB_IOCTL_OFF ).
I did not try any 3D stuff yet. But it looks like we have very tight GPU control.
Edit: grammar, constants
Looks like the hunt for black cat in the black room.
I tried to create instance of CLASS_3D via RAMIN writes. I did only three memory writes:
1.) offset in dwords 0x64cb8, value 0xfeed0003 ( RAMHT key )
2.) offset in dwords 0x64cb9, value 0x00105020 ( RAMHT value )
3.) offset in dwords 0x1d020 * 4, value 0x00004097 ( instance data, class 3D )
the newly created object was attached to subchannel 0. 2D blits work after that insertion.
The only problem is hypervisor. It is very unhappy with my CLASS_3D and cannot make regular screen blits after ioctl( PS3FB_IOCTL_OFF ).
I did not try any 3D stuff yet. But it looks like we have very tight GPU control.
Edit: grammar, constants
Last edited by IronPeter on Fri Oct 26, 2007 4:31 am, edited 1 time in total.
3D is working.
The first operation was TCL_PRIMITIVE_3D_CLEAR_BUFFERS. It does work with my CLASS_3D instance.
Want to draw triangle. Do not expect result ( positive or negative ) in less than 3-4 days. Need a lot of stuff ( pixel and fragment shaders, shader constants ) to setup.
Want to draw triangle. Do not expect result ( positive or negative ) in less than 3-4 days. Need a lot of stuff ( pixel and fragment shaders, shader constants ) to setup.
first documentation draft
Hi,
I've started a very first draft of the RSX documentation here:
http://wiki.ps2dev.org/ps3:rsx
It is probably still full of spelling and grammar errors, and may contain incorrect information. I tried to make a self-contained document and therefore it summarizes a lot of information that can be found elsewhere, especially in the nouveau wiki. Everything is on a single page for now. Anyway, native speakers are welcome to edit this page at will. Same goes for the technical side which may lack some information, be imprecise or unclear.
IronPeter, good work on the 3D object!
I'm also interested in the 3D, fragment and vertex shaders, etc.. to try the NV40 Xorg Composite code from nouveau (blending). BTW, a friend and I managed to get Xv working relatively well (for a 3-hour long hacking session) using the blitter. Code is still very ugly, but we will provide an update of the experimental Xorg driver soon.
I've started a very first draft of the RSX documentation here:
http://wiki.ps2dev.org/ps3:rsx
It is probably still full of spelling and grammar errors, and may contain incorrect information. I tried to make a self-contained document and therefore it summarizes a lot of information that can be found elsewhere, especially in the nouveau wiki. Everything is on a single page for now. Anyway, native speakers are welcome to edit this page at will. Same goes for the technical side which may lack some information, be imprecise or unclear.
IronPeter, good work on the 3D object!
I'm also interested in the 3D, fragment and vertex shaders, etc.. to try the NV40 Xorg Composite code from nouveau (blending). BTW, a friend and I managed to get Xv working relatively well (for a 3-hour long hacking session) using the blitter. Code is still very ugly, but we will provide an update of the experimental Xorg driver soon.
Heres my wokaround using the SPE's :)
bad joke its 5.am
but in fact it kind of is.
I threw an experimental SPU Xv driver on SVN.
It does 1080p in X using 1 spu so it looks ok but theres lots of TODO's with it like fix interrupt handlers and timers.
Expect install guideline within days inside spu-medialib withing the Xv thread.
it uses around 25% of a single spu when upscaling 720p to 1080p. however i think its feasable to bring this down to 12.5% or lower.
Scaling method now is bilinear floating point precision.
But we are working/looking into other scalers.
bad joke its 5.am
but in fact it kind of is.
I threw an experimental SPU Xv driver on SVN.
It does 1080p in X using 1 spu so it looks ok but theres lots of TODO's with it like fix interrupt handlers and timers.
Expect install guideline within days inside spu-medialib withing the Xv thread.
it uses around 25% of a single spu when upscaling 720p to 1080p. however i think its feasable to bring this down to 12.5% or lower.
Scaling method now is bilinear floating point precision.
But we are working/looking into other scalers.
Don't do it alone.
just fun.
It seems like some areas of RAMIN are persistent even after cold reboot...
Looks like good idea to analyze RAMIN content after GameOS :).
Looks like good idea to analyze RAMIN content after GameOS :).
Hi,
Here is an update of the GPU based Xorg driver, with accelerated Xv support:
git clone http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git
This patch to the PS3 Linux framebuffer driver is needed to test the Xorg driver:
http://mandos.homelinux.org/~glaurung/p ... ps3fb.diff
Xv support is based on the nouveau code, using the blitter. It supports YUYV and UYVY formats, and clipping and boxes work as expected (i.e. you can put a normal window over the video). Video is _not_ synchronized on vsync yet. The Xorg driver only works in the PS3 fullscreen mode for now (see the ps3videomode utility). Also, the EXA Copy operation still has a rendering bug and solid fills are unaccelerated. Performance is ok, rendering a full HD video, Xorg takes ~10% CPU time (but YV12->YUYV conversion being done by mplayer, it should be added to that figure). Thanks to the nouveau guys for the initial code, and to my friend who did most of the Xv adaptation.
unsolo, I didn't have a chance to try your SPU-based implementation yet, but will soon. It is probably a better short term alternative to the FB driver, and we should continue working on both drivers in parallel. For technical comparison, the GPU blitter is doing bilinear interpolation, scaling steps are in 12.20 fixed point and source coordinates in 16th of pixels. There are limits to the width and height, but a single blit can handle full HD. The blit speed is about 16.5GB/s, but the source video has to be copied to the XDR framebuffer region first to be accessible for DMA (this is similar to how AGP works), and must be in YUYV or UYVY format.
Here is an update of the GPU based Xorg driver, with accelerated Xv support:
git clone http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git
This patch to the PS3 Linux framebuffer driver is needed to test the Xorg driver:
http://mandos.homelinux.org/~glaurung/p ... ps3fb.diff
Xv support is based on the nouveau code, using the blitter. It supports YUYV and UYVY formats, and clipping and boxes work as expected (i.e. you can put a normal window over the video). Video is _not_ synchronized on vsync yet. The Xorg driver only works in the PS3 fullscreen mode for now (see the ps3videomode utility). Also, the EXA Copy operation still has a rendering bug and solid fills are unaccelerated. Performance is ok, rendering a full HD video, Xorg takes ~10% CPU time (but YV12->YUYV conversion being done by mplayer, it should be added to that figure). Thanks to the nouveau guys for the initial code, and to my friend who did most of the Xv adaptation.
unsolo, I didn't have a chance to try your SPU-based implementation yet, but will soon. It is probably a better short term alternative to the FB driver, and we should continue working on both drivers in parallel. For technical comparison, the GPU blitter is doing bilinear interpolation, scaling steps are in 12.20 fixed point and source coordinates in 16th of pixels. There are limits to the width and height, but a single blit can handle full HD. The blit speed is about 16.5GB/s, but the source video has to be copied to the XDR framebuffer region first to be accessible for DMA (this is similar to how AGP works), and must be in YUYV or UYVY format.
IronPeter, you mentioned 'multiply blits' earlier, did you manage to get that working?
The problem I have with blending is that if I change the blit operation from SRCCOPY (3) to BLEND (2) it hangs the GPU. I think that's why the nouveau guys implemented the EXA composite operation using a textured quad with shaders doing the blending instead of using the blitter.
Oh, and nice finding about the left-over RAMIN memory :-)
The problem I have with blending is that if I change the blit operation from SRCCOPY (3) to BLEND (2) it hangs the GPU. I think that's why the nouveau guys implemented the EXA composite operation using a textured quad with shaders doing the blending instead of using the blitter.
Oh, and nice finding about the left-over RAMIN memory :-)
I've been following this thread for a while, great stuff by everyone involved, keep it up!
I've just started a 'little' pet project for a H.264 decoder specifically written for the Cell CPU, from scratch and thus not based on the ffmpeg decoder which is hard to parallelize optimally. It's still at the very, very, very early stages (only working on CABAC atm, which will probably be PPU based), but if ever evolves to something usable you can understand I'm very much interested in the accelerated graphics work, even if only 2D.
IronPeter or Glaurung: do you expect Sony can and will block the hypervisor regions that now allow (at least) 2D access in a future fw? And if they don't, do you expect your work to be merged into an official Xv/xorg/directfb driver with HW acceleration eventually?
I'm willing to test some stuff later if that might help in any way, and I'll keep following this thread!
I've just started a 'little' pet project for a H.264 decoder specifically written for the Cell CPU, from scratch and thus not based on the ffmpeg decoder which is hard to parallelize optimally. It's still at the very, very, very early stages (only working on CABAC atm, which will probably be PPU based), but if ever evolves to something usable you can understand I'm very much interested in the accelerated graphics work, even if only 2D.
IronPeter or Glaurung: do you expect Sony can and will block the hypervisor regions that now allow (at least) 2D access in a future fw? And if they don't, do you expect your work to be merged into an official Xv/xorg/directfb driver with HW acceleration eventually?
I'm willing to test some stuff later if that might help in any way, and I'll keep following this thread!
Mabye i could make that into something like this.
CONV_YUV420_TO_YUV(Ypointer,Upointer,Vpointer,Istride0,Istride1,Opointer,Ostride0,TAG)
WAIT_CONV(TAG) //waits for that tag to be marked as complete.
where TAG is a 32 bit int or something.
In theory that should be capable of going in close to 24GB/s speeds on a single spe. (I do not think i will get there tho for several reasons)
CONV_YUV420_TO_YUV(Ypointer,Upointer,Vpointer,Istride0,Istride1,Opointer,Ostride0,TAG)
WAIT_CONV(TAG) //waits for that tag to be marked as complete.
where TAG is a 32 bit int or something.
In theory that should be capable of going in close to 24GB/s speeds on a single spe. (I do not think i will get there tho for several reasons)
Don't do it alone.
Hi, I tried to draw something in 3D. It does not work :). Probably, some states were not setted correctly
Anyway, sources:
http://ps3rsx.googlecode.com/svn/trunk/
This demo does not draw anything ( but tries to render quad ), it just clears blue channel of subrectangle of the screen. The demo needs glaurung's kernel patch to work.
Known potential issues
1.) probably incorrect mask for vp_in - vp_out.
2.) probably incorrect endiannes for fp,
Anyway, sources:
http://ps3rsx.googlecode.com/svn/trunk/
This demo does not draw anything ( but tries to render quad ), it just clears blue channel of subrectangle of the screen. The demo needs glaurung's kernel patch to work.
Known potential issues
1.) probably incorrect mask for vp_in - vp_out.
2.) probably incorrect endiannes for fp,
3D is working.
just small bugs.
It works for now. Make svn up.
It works for now. Make svn up.
svn hosting
Can be ps3rsx project ( or better OpenRSX ) hosted on ps2dev site?
I do not want to use google.code. I have not any experience in the svn administration and want to use simple service with trivial user rights setup, etc.
I do not want to use google.code. I have not any experience in the svn administration and want to use simple service with trivial user rights setup, etc.