The hunt for HV's FIFO/Push buffer...

Technical discussion on the newly released and hard to find PS3.

Moderators: cheriff, emoon

Post Reply
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Erratum... (references below comes from recent version of nv_objects.h)
My memory didn't work well. It's not as simple as a command that allows to write anywhere. It's an interrupt with interesting settings done just before firing it. And this mechanism assumes an interrupt handler has been set by Hypervisor for his own needs (but we assume altering push buffer was not expected by Sony and so, no heavy protection is set, yet, to prevent us from using this interrupt handler for our own nasty goals...)

You could write to any register, on xbox1 (nv2a) by taking advantage of a specific interrupt you could fire with command

#define NV20_TCL_PRIMITIVE_3D_FIRE_INTERRUPT 0x00000100
(0x100 is considered a "trapped address" by NVidia miniport driver)

with data=0x320 (in order to identify what treatment should be done, inside the interrupt handler)


Extract from pbKit.c (interrupt handler) :

Code: Select all

static void pb_subprog(DWORD subprogID, DWORD paramA, DWORD paramB)
{
	//inner registers 0x1D8C & 0x1D90 match 2 outer registers :
	//[0x1D8C]=[NV20_TCL_PRIMITIVE_3D_PARAMETER_A]=VIDEOREG(NV_PGRAPH_PARAMETER_A)=[0xFD401A88]
	//[0x1D90]=[NV20_TCL_PRIMITIVE_3D_PARAMETER_B]=VIDEOREG(NV_PGRAPH_PARAMETER_B)=[0xFD40186C]
	//so they can be used by a push buffer sequence to set parameters
	//before triggering a subprogram by the command 0x0100 which will
	//throw an interrupt and have CPU execute its code right here.
	
	//Here just test the subprogID value and execute your own subprogram
	//associated code (avoid using subprogID=0, it seems to be reserved)

	int			next;

	switch(subprogID)
	{
		case PB_SETOUTER: //sets an outer register
			VIDEOREG(paramA)=paramB;
			break;

(note that PB_SETOUTER=0xB2A in pbKit, changed, just for fun...)

The exact same code exists in xbox1 miniport driver...
(triggered by 0x320 value)

Before triggering the copy (register #paramA <= value ParamB), you need, of course, to set a value for paramA and paramB. And here comes a tasty secret : Two inner registers (you can set only with push buffer commands) are physically linked to two outer registers (mmio registers).

So, in order to set any value anywhere you would enqueue this :

Code: Select all

pb_push1&#40;p,NV20_TCL_PRIMITIVE_3D_PARAMETER_A,dest_reg&#41;; p+=2;
pb_push1&#40;p,NV20_TCL_PRIMITIVE_3D_PARAMETER_B,value&#41;; p+=2;
pb_push1&#40;p,NV20_TCL_PRIMITIVE_3D_FIRE_INTERRUPT,PB_SETOUTER&#41;; p+=2; //subprogID PB_SETOUTER&#58; set a value in dest_reg
Now, let's dream a bit...
Let's assume hypervisor handles such interrupts and have a set of prepared treatments depending on the data following 0x100 command...
(I don't know well chipsets>nv20, so I'm just refering nv20 values as ex.)
One of these treatments may be that : write something anywhere
(with HV's priviledges!)

Ok, it's a naive dream... But accessing HV's push buffer was one too...
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Another way to investigate :
Let's try to copy the 360 KK exploit.
To do that, we can upload shader code through push buffer commands.
Now, question is, is there a shader instruction that allows massive data transfert from[to] anywhere in memory to[from] sharder constants array?
Also do we have enough access to start a shader execution?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

As far as I know, memory areas must be iomapped to RSX for any I/O. So you have to create DMA object in the GPU RAMIN area and have to setup the system bus ( for any type of GPU I/O ).

lv1_gpu_context_iomap does that setup. I think that privilegies for direct CPU DMA and lv1_gpu_context_iomap are the same.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

My direction.

Post by IronPeter »

I do not want to "hack" hypervisor.

I want to get "legal" 3D. I think about backdoor for the context object creation.

It is great if somebody is able to test all the entries of http://wiki.ps2dev.org/ps3:hypervisor:l ... _attribute ( with push buffer dump and the RAMIN trace ).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, I've downloaded your modified ps3fb. With your sources I am unable to reproduce RAMIN blit ( still using my old firmware ).

Probably, the problem is in DDR memory alloc. I am using 0 bytes, you try to alloc 252 megs.

Please, retest ( I have no time and want to sleeeep ).
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

confirmed RAMIN blit

Post by Glaurung »

Hi,

Thanks IronPeter for noticing this. I can now blit from above 252MB, so it is confirmed that firmware version is _not_ a problem. So the first paramter of lv1_gpu_memory_allocate() is just setting a limit to vram access, but leaving it to zero disables the security entirely!! Anyway, I wanted to play nice with the HV and this is how I got rewarded... Just as you, I would prefer accessing the GPU the proper way, i.e. through the HV interface. But since we don't have much doc on it yet, those workarounds will be quite helpful to try and figure out what the non-crashing HV calls do, until the holes get patched (but I guess the HV calls will stay, is there any good reason to believe the HV is not the same for the GameOS and games SDK? I suspect the effort was not made only to support Linux, in which case changing the HV must not break backward compatibility).

I'll provide an update of the kernel patch and ps3gpu user application soon.

On the Xorg front, I now have EXA UploadToScreen and DownloadFromScreen accelerated through use of the NvMemFormat objects. I also have Composite, but blending does not work so it is not very interresting. Blending is not as simple as it used to be on previous NV hardware so I can't simply change the operation from 3 (srccopy) to 2 (blend) :( ... It seems it can't be done with the NvScaledImage object and, looking at what the nouveau guys did, I'll actually need 3D for that, so maybe it's time I get interested in that too ;-) Anyway, I'll first check what can be done on the Xv side to accelerate video rendering.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

Hi
I have started some work on the spu Xv as this will not be just limited to PS3 but will also work on any future FB CELL's and will likely work for any ppc + spursengine. however it will probably need modifications to work on a x86 + spursengine but since its chars we only need to consider endianess with regards to the memory pointers and settings. unless that is covered in the spursengine dma interface.

I have not gotten to much done, but i have the base code in spu-medialib i belive so just a matter of me understanding Xv..
Don't do it alone.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

unsolo, it could be interesting to have EXA acceleration provided by the SPU too, if you want to add that to the medialib (alpha blending, format convertion, offloading of large memory transfer and rectangular fills to the SPU, etc..). This is a bit offtopic though..
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

Sounds like a plan However i think acellerating mpeg and h264 should be higher on priority list.

.. DirectFB might be worth looking at as well.
Don't do it alone.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

update to demo code

Post by Glaurung »

Hi,

As promised here is an update of the kernel patch and user demo code:
http://manwe.homelinux.org/~glaurung/ps3

2MB of upper video memory can be dumped. I also added a function for 32-bit read and write at 254MB+offset, using a single pixel blit. Read works find, write stalls the next operation.

IronPeter, did you try writing to RAMIN?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung I think I was able to modify RAMIN. Not sure, need to retest.
sigbus
Posts: 3
Joined: Tue Oct 16, 2007 7:38 pm

Post by sigbus »

While there, I wanted to point out that it's not actually a security hole :-) The cell chip has an iommu that protects the hypervisor, whatever DMA you can do will only land in linux partition space afaik. I think when linux registers it's memory with the GPU (some HV call at one point, no source at hand right now) it basically gets the logical memory (peudo-physical partition memory) mapped into the iommu for access by the video chip.

I've not verified, and it's possible that they disabled the iommu for performances reason (that would definitely be a security hole) but it sounds logical that way.

In this case, leaving access to the chip is not a hole to be plugged, in fact, it's something that sony might use themselves if they ever release an accelerated linux driver, and calling it a hole might just be damaging for us by giving the wrong message to whoever from sony is lurking on this forum ...

The fact that you are supposed to use the HV to access RAMIN sounds more like an architecture decision of how sony own driver work here... now we need to find the right HV API calls to manipulate the object.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

sigbus, you are of course 100% right.

We can not damage inner hypervisor structures with RSX RAMIN access ( if RSX DMAs are properly protected ).

RAMIN access is just the way to access RSX inner data.

Please excuse my English.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

It seems like RAMIN contains holes, inaccessible by writes. My previous random tests worked with accessible areas.

vramin_write32 works for some offsets inside RAMIN. I tested 0 and 0x12000 offsets, both worked. You may try XDR -> DDR dma. You may try iomapped CPU writes.

Anyway, I am not interesting in direct RAMIN changing.

Excuse, now I have only a few hours per a week for playing with my ps3, do not expect the fast progress from me.

PS: Glaurung, you have very nice coding style. And please, try blend for blits. You need only this thing for 2d driver, just do it!
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

I am stupid stupid stupid

Post by IronPeter »

The fail of blit with the offset 254 * 1024 * 1024 + 0x20 is just an alignment issue. The blit with the offset 0x20 fails also...
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

unsolo wrote:I have started some work on the spu Xv as this will not be just limited to PS3 but will also work on any future FB CELL's and will likely work for any ppc + spursengine.
Glaurung wrote:unsolo, it could be interesting to have EXA acceleration provided by the SPU too, if you want to add that to the medialib (alpha blending, format convertion, offloading of large memory transfer and rectangular fills to the SPU, etc..). This is a bit offtopic though..
I have already written an SPU alpha blend function as part of my python-ps3 project. SPU code is here: http://python-ps3.svn.sourceforge.net/v ... iew=markup,
project is here: http://python-ps3.sourceforge.net/

There's still a bit more optimisation that can be done, like vectorising the load and store at the beginning and end (it was done this way for pixel alignment reasons) and unrolling could remove a few more stall cycles, but it's probably close to optimal already for simple alpha blitting.

I'm due to rewrite a large chunk of my library soon to avoid all the naive reading and writing from the screen, so that screen operations become write only and the entire task of rendering is delegated to the SPU instead of the PPC managing the process.

I also spent some time thinking about to do scaling and rotation, bearing in mind the limited memory of an SPU. I have some plans for that, but no time to start on that yet...

I'm also very looking forward to using some of this RSX knowledge to add more features to my library as I'll then be able to reclaim back the SPUs to do cool stuff with!

[edit]Just a thought. You probably won't want the opacity stuff (applying an overall alpha value to the entire blit), so the previous version is probably more useful: http://python-ps3.svn.sourceforge.net/v ... iew=markup
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

EXA accelerated driver

Post by IronPeter »

the problem with nouveau and composite driver is very simple.

2D blits only support ONE_MINUS_SRC_ALPHA, SRC_ALPHA blend mode.
EXA driver uses premultiplied alpha and needs ONE_MINUS_SRC_ALPHA, ONE blend mode.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

lv1_gpu_context_attribute research

Post by IronPeter »

I worked with lv1_gpu_context_attribute functions. Looks like subfunctionts of this function do not insert any objects into RAMHT. Just some random 2D stuff. Need to retest.

Probably it is possible to use "RAMIN surgery". To tweak headers of alerady created objects. But I do not want to hack RAMIN in any way.

We need fresh ideas.

I want to play with the second parameter of lv1_gpu_context_allocate ( zero by default ), all objects are created in this HV call.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

searching too...

Post by Glaurung »

Hi,

I've been trying a few random gpu_context_attribute() calls too with no real results either so far. The attributes 0x106 and 0x107 seems to be doing some stuff with the RAMFC, apart from that nothing interesting. I'm going through potential subprogID, that ps2devman mentioned (if that FIFO trap command still exists...), to see if there is a mean to write the MMIO registers using FIFO commands, but the chances are small.
Also, I just came across this page:
http://wiki.ps2dev.org/ps3:hypervisor:l ... device_map
and think it would be interesting to map all 'gpu devices' regions and see if they are affected by the gpu_context_attribute() calls. I don't think any of the dumps look like NV MMIO but it is worth checking.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, are you able to test 0x001, 0x002, 0x003 and 0x201, 0x202, 0x203 entries?

These functions work well on the cpu side, but RSX hangs up. I can not use RAMIN-blit-trick to watch RAMIN changes for these subfuntions.
tmaster
Posts: 11
Joined: Fri Oct 21, 2005 5:32 am
Location: Ireland

Post by tmaster »

I found this over on ps3news.

It is somthing to do RSX data dumps it has been out a while now. maybe you can do somthing with or maybe not.

Here what it says

Code: Select all

I have released a dump of the PlayStation 3 Graphic Libraries for the RSX Hardware in the PS3, available in iRC EFNet #PS3News.

These are not machine readable, they are human readable.

They are being released because there is a wealth of information in there, and hacking NVidia GPU's is a specialist field. Hopefully the appropriate experts can take these files and provide necessary information to allow full RSX access from under linux. I am quite confident this is possible from my cursory scan of this data and the functions that exist in the PS3 Hypervisor. &#40;Especially the undocumented RSX Register access functions&#41;.

In any event I will continue to work on this area.

Note, these files are from code that runs UNDER GameOS. Linux does not run under GameOS, it runs directly under the Hypervisor, so there are no Hypervisor calls in this data.

The data certainly reveals the structure of critical memory structures, register layout, etc.

Enjoy.

PS, If you don't understand what these files are or show, then you are not the target audience &#40;sorry&#41;.
Ps: Keep up the good work ps3 devs.
Last edited by tmaster on Sun Oct 21, 2007 6:49 am, edited 1 time in total.
ooPo
Site Admin
Posts: 2023
Joined: Sat Jan 17, 2004 9:56 am
Location: Canada
Contact:

Post by ooPo »

tmaster,

We like to promote legal and open development here. That means no peeking at licensed code or taking any information from any source that requires an NDA to do so.

Please don't post links to where people can find this information.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

IronPeter, the 0x201 entry works when called from the ps3fb context, but does nothing interesting. However, when called from different context (calling lv1_gpu_memory_allocate(), lv1_gpu_context_allocate() again from another kernel module for example), it does some initialization stuff. Indeed new DMA and graphics objects are created in RAMIN (the same set as we are used too, but for engine '9' instead of '1'..). I have not yet looked into the details. Also, call 0x001 works too when called from a second context. The 0x001, 0x201, and 0x400 calls seem to do the same kind of initialization (maybe they are operation that require a GPU context switch, causing the HV to do the context initialization for us, I don't know).
Also, did you notice the driver_info structure contains the object handles, it actually looks like this to me:

Code: Select all

struct gpu_driver_info &#123;
	u32 version_driver;
	u32 version_gpu;
	u32 memory_size;
	u32 hardware_channel;
	u32 nvcore_frequency;
	u32 memory_frequency;
	u32 reserved1&#91;16&#93;;
	u32 dma_obj&#91;256&#93;;
	u32 gfx_obj&#91;256&#93;;
	u32 notify_obj&#91;256&#93;;
	u32 unk_obj&#91;256&#93;;
	u32 reserved2&#91;23&#93;;
	struct display_head display_head&#91;8&#93;;
	struct gpu_irq irq;
&#125;;
A second structure is filled at offset 0x4000 when any of the lv1_gpu_context_attribute mentioned above is called with the second context handle. Reading more than 32kB of driver info crashed (2 contexts) but the definition in Linux is 128 kB which would make 8 contexts.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, very strange...

I of course tried all entries with the newly created context also. I was unable to note any changes in the GPU control structures in RAMIN. I wanted to find something like

lv1_gpu_context_attribute( u64 context_handle,
LV1_CREATE_GPU_OBJECT, u64 object_class, u64 object_handle );

Nothing happens in RAMIN or RSX hangs up.

Good note about gpu_driver_info. It looks like driver-side copy of gpu objects information ( nouveau drm uses the same one ).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

context data

Post by IronPeter »

Hi, I compared RAMIN dump form the offset 0x78000 dwords with

http://gitweb.freedesktop.org/?p=mesa/d ... 40_graph.c

Compare

Code: Select all


 47 /*TODO&#58; deciper what each offset in the context represents. The below
 48  *      contexts are taken from dumps just after the 3D object is
 49  *      created.
 50  */
...
 116         /* 0x680-0x6BC - NV30_TCL_PRIMITIVE_3D_TX_ADDRESS_UNIT&#40;0-15&#41; */
 117         /* 0x6C0-0x6FC - NV30_TCL_PRIMITIVE_3D_TX_FORMAT_UNIT&#40;0-15&#41; */
 118         for &#40;i=0x006C0; i<=0x006fc; i+=4&#41;
 119                 INSTANCE_WR&#40;ctx, i/4, 0x00018488&#41;;
 120         /* 0x700-0x73C - NV30_TCL_PRIMITIVE_3D_TX_WRAP_UNIT&#40;0-15&#41; */
 121         for &#40;i=0x00700; i<=0x0073c; i+=4&#41;
 122                 INSTANCE_WR&#40;ctx, i/4, 0x00028202&#41;;
and this fragment:
offset(dword) : value
781ac 18488
781ad 18488
781ae 18488
781af 18488
781b0 18488
781b1 18488
781b2 18488
781b3 18488
781b4 18488
781b5 18488
781b6 18488
781b7 18488
781b8 18488
781b9 18488
781ba 18488
781bb 18488
781bc 28202
781bd 28202
781be 28202
781bf 28202
781c0 28202
781c1 28202
781c2 28202
781c3 28202
781c4 28202
781c5 28202
781c6 28202
781c7 28202
781c8 28202
781c9 28202
781ca 28202
781cb 28202
781cc 0

Pretty nice.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

updated ps3gpu test program

Post by Glaurung »

Hi,

I've updated the ps3gpu test program, to include RAMIN analysis, GPU devices region dumps, and driver_info and reports region dumps. The ps3fb.diff patch is also updated:
http://manwe.homelinux.org/~glaurung/ps3/20071021/

IronPeter, with this code, I observe a new objects being created with the 0x201 call (check the log_pre.txt and log_post.txt files produced before and after inserting the module inside ps3gpu.c). Commenting out the insmod and using an ioctl to perform the same call from within ps3fb using its context does not produce anything interesting. Note, the creation of new objects only happen the first time any of the 0x001, 0x201 or 0x400 call is used from the second context.

Also, there is indeed a graph context at the very end of VRAM (0x0ffe0000). This can be verified because its first dword is its own offset within RAMIN (0x0ff80000) shifted 4, that is 0x6000.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, solid work.

>>I observe a new objects being created with the 0x201 call

Of course, I'll check it. Thanks. What is about object types? 2D related? Just want to know.

PS: context seems to be setted up for 3D properly.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

IronPeter, that's the same DMA and GFX objects we observed already, they are just allocated in RAMIN a second time (after the first set). Also, the handles are the same but hash to a different entry in RAMHT, so I suspect the channel (not subchannel) to be different, as would be required for hardware context switching. The objects we know all use channel 0, I'll hash the handles with other channel values and see how to produce the hashes I observe for the second set of objects. Another difference is the engine id part of the RAMIN address in the hash table entry, which is 9 instead of 1.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

Just checked with a few handles, the new object set indeed use hardware channel 1 instead of 0. No news of any 3D object so far (we're looking for class 0x97). Also, I was thinking it would be nice to start a wiki page on all this, in a more human readable form :-), what do you think?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Aaah, the second copy of 2D objects just with the different hashes... Yes, I noticed that copy. Creation of these objects is not lazy. They are created in alloc_context call.

The wiki page is great idea. The only problem is my English. If somebody with native English will edit this page...
Post Reply