| View previous topic :: View next topic |
| Author |
Message |
ps2devman
Joined: 09 Oct 2006 Posts: 271
|
Posted: Mon Oct 15, 2007 7:31 pm Post subject: |
|
|
Erratum... (references below comes from recent version of nv_objects.h)
My memory didn't work well. It's not as simple as a command that allows to write anywhere. It's an interrupt with interesting settings done just before firing it. And this mechanism assumes an interrupt handler has been set by Hypervisor for his own needs (but we assume altering push buffer was not expected by Sony and so, no heavy protection is set, yet, to prevent us from using this interrupt handler for our own nasty goals...)
You could write to any register, on xbox1 (nv2a) by taking advantage of a specific interrupt you could fire with command
#define NV20_TCL_PRIMITIVE_3D_FIRE_INTERRUPT 0x00000100
(0x100 is considered a "trapped address" by NVidia miniport driver)
with data=0x320 (in order to identify what treatment should be done, inside the interrupt handler)
Extract from pbKit.c (interrupt handler) :
| Code: |
static void pb_subprog(DWORD subprogID, DWORD paramA, DWORD paramB)
{
//inner registers 0x1D8C & 0x1D90 match 2 outer registers :
//[0x1D8C]=[NV20_TCL_PRIMITIVE_3D_PARAMETER_A]=VIDEOREG(NV_PGRAPH_PARAMETER_A)=[0xFD401A88]
//[0x1D90]=[NV20_TCL_PRIMITIVE_3D_PARAMETER_B]=VIDEOREG(NV_PGRAPH_PARAMETER_B)=[0xFD40186C]
//so they can be used by a push buffer sequence to set parameters
//before triggering a subprogram by the command 0x0100 which will
//throw an interrupt and have CPU execute its code right here.
//Here just test the subprogID value and execute your own subprogram
//associated code (avoid using subprogID=0, it seems to be reserved)
int next;
switch(subprogID)
{
case PB_SETOUTER: //sets an outer register
VIDEOREG(paramA)=paramB;
break;
|
(note that PB_SETOUTER=0xB2A in pbKit, changed, just for fun...)
The exact same code exists in xbox1 miniport driver...
(triggered by 0x320 value)
Before triggering the copy (register #paramA <= value ParamB), you need, of course, to set a value for paramA and paramB. And here comes a tasty secret : Two inner registers (you can set only with push buffer commands) are physically linked to two outer registers (mmio registers).
So, in order to set any value anywhere you would enqueue this :
| Code: |
pb_push1(p,NV20_TCL_PRIMITIVE_3D_PARAMETER_A,dest_reg); p+=2;
pb_push1(p,NV20_TCL_PRIMITIVE_3D_PARAMETER_B,value); p+=2;
pb_push1(p,NV20_TCL_PRIMITIVE_3D_FIRE_INTERRUPT,PB_SETOUTER); p+=2; //subprogID PB_SETOUTER: set a value in dest_reg
|
Now, let's dream a bit...
Let's assume hypervisor handles such interrupts and have a set of prepared treatments depending on the data following 0x100 command...
(I don't know well chipsets>nv20, so I'm just refering nv20 values as ex.)
One of these treatments may be that : write something anywhere
(with HV's priviledges!)
Ok, it's a naive dream... But accessing HV's push buffer was one too... |
|
| Back to top |
|
 |
ps2devman
Joined: 09 Oct 2006 Posts: 271
|
Posted: Mon Oct 15, 2007 7:40 pm Post subject: |
|
|
Another way to investigate :
Let's try to copy the 360 KK exploit.
To do that, we can upload shader code through push buffer commands.
Now, question is, is there a shader instruction that allows massive data transfert from[to] anywhere in memory to[from] sharder constants array?
Also do we have enough access to start a shader execution? |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Tue Oct 16, 2007 12:58 am Post subject: |
|
|
As far as I know, memory areas must be iomapped to RSX for any I/O. So you have to create DMA object in the GPU RAMIN area and have to setup the system bus ( for any type of GPU I/O ).
lv1_gpu_context_iomap does that setup. I think that privilegies for direct CPU DMA and lv1_gpu_context_iomap are the same. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Tue Oct 16, 2007 1:05 am Post subject: My direction. |
|
|
I do not want to "hack" hypervisor.
I want to get "legal" 3D. I think about backdoor for the context object creation.
It is great if somebody is able to test all the entries of http://wiki.ps2dev.org/ps3:hypervisor:lv1_gpu_context_attribute ( with push buffer dump and the RAMIN trace ). |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Tue Oct 16, 2007 5:31 am Post subject: |
|
|
Glaurung, I've downloaded your modified ps3fb. With your sources I am unable to reproduce RAMIN blit ( still using my old firmware ).
Probably, the problem is in DDR memory alloc. I am using 0 bytes, you try to alloc 252 megs.
Please, retest ( I have no time and want to sleeeep ). |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Tue Oct 16, 2007 8:25 am Post subject: confirmed RAMIN blit |
|
|
Hi,
Thanks IronPeter for noticing this. I can now blit from above 252MB, so it is confirmed that firmware version is _not_ a problem. So the first paramter of lv1_gpu_memory_allocate() is just setting a limit to vram access, but leaving it to zero disables the security entirely!! Anyway, I wanted to play nice with the HV and this is how I got rewarded... Just as you, I would prefer accessing the GPU the proper way, i.e. through the HV interface. But since we don't have much doc on it yet, those workarounds will be quite helpful to try and figure out what the non-crashing HV calls do, until the holes get patched (but I guess the HV calls will stay, is there any good reason to believe the HV is not the same for the GameOS and games SDK? I suspect the effort was not made only to support Linux, in which case changing the HV must not break backward compatibility).
I'll provide an update of the kernel patch and ps3gpu user application soon.
On the Xorg front, I now have EXA UploadToScreen and DownloadFromScreen accelerated through use of the NvMemFormat objects. I also have Composite, but blending does not work so it is not very interresting. Blending is not as simple as it used to be on previous NV hardware so I can't simply change the operation from 3 (srccopy) to 2 (blend) :( ... It seems it can't be done with the NvScaledImage object and, looking at what the nouveau guys did, I'll actually need 3D for that, so maybe it's time I get interested in that too ;-) Anyway, I'll first check what can be done on the Xv side to accelerate video rendering. |
|
| Back to top |
|
 |
unsolo
Joined: 16 Apr 2007 Posts: 155 Location: OSLO Norway
|
Posted: Tue Oct 16, 2007 8:37 am Post subject: |
|
|
Hi
I have started some work on the spu Xv as this will not be just limited to PS3 but will also work on any future FB CELL's and will likely work for any ppc + spursengine. however it will probably need modifications to work on a x86 + spursengine but since its chars we only need to consider endianess with regards to the memory pointers and settings. unless that is covered in the spursengine dma interface.
I have not gotten to much done, but i have the base code in spu-medialib i belive so just a matter of me understanding Xv.. _________________ Don't do it alone. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Tue Oct 16, 2007 9:09 am Post subject: |
|
|
| unsolo, it could be interesting to have EXA acceleration provided by the SPU too, if you want to add that to the medialib (alpha blending, format convertion, offloading of large memory transfer and rectangular fills to the SPU, etc..). This is a bit offtopic though.. |
|
| Back to top |
|
 |
unsolo
Joined: 16 Apr 2007 Posts: 155 Location: OSLO Norway
|
Posted: Tue Oct 16, 2007 9:21 am Post subject: |
|
|
Sounds like a plan However i think acellerating mpeg and h264 should be higher on priority list.
.. DirectFB might be worth looking at as well. _________________ Don't do it alone. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Tue Oct 16, 2007 10:46 am Post subject: update to demo code |
|
|
Hi,
As promised here is an update of the kernel patch and user demo code:
http://manwe.homelinux.org/~glaurung/ps3
2MB of upper video memory can be dumped. I also added a function for 32-bit read and write at 254MB+offset, using a single pixel blit. Read works find, write stalls the next operation.
IronPeter, did you try writing to RAMIN? |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Tue Oct 16, 2007 4:12 pm Post subject: |
|
|
| Glaurung I think I was able to modify RAMIN. Not sure, need to retest. |
|
| Back to top |
|
 |
sigbus
Joined: 16 Oct 2007 Posts: 3
|
Posted: Tue Oct 16, 2007 8:19 pm Post subject: |
|
|
While there, I wanted to point out that it's not actually a security hole :-) The cell chip has an iommu that protects the hypervisor, whatever DMA you can do will only land in linux partition space afaik. I think when linux registers it's memory with the GPU (some HV call at one point, no source at hand right now) it basically gets the logical memory (peudo-physical partition memory) mapped into the iommu for access by the video chip.
I've not verified, and it's possible that they disabled the iommu for performances reason (that would definitely be a security hole) but it sounds logical that way.
In this case, leaving access to the chip is not a hole to be plugged, in fact, it's something that sony might use themselves if they ever release an accelerated linux driver, and calling it a hole might just be damaging for us by giving the wrong message to whoever from sony is lurking on this forum ...
The fact that you are supposed to use the HV to access RAMIN sounds more like an architecture decision of how sony own driver work here... now we need to find the right HV API calls to manipulate the object. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Tue Oct 16, 2007 9:27 pm Post subject: |
|
|
sigbus, you are of course 100% right.
We can not damage inner hypervisor structures with RSX RAMIN access ( if RSX DMAs are properly protected ).
RAMIN access is just the way to access RSX inner data.
Please excuse my English. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Wed Oct 17, 2007 4:01 am Post subject: |
|
|
It seems like RAMIN contains holes, inaccessible by writes. My previous random tests worked with accessible areas.
vramin_write32 works for some offsets inside RAMIN. I tested 0 and 0x12000 offsets, both worked. You may try XDR -> DDR dma. You may try iomapped CPU writes.
Anyway, I am not interesting in direct RAMIN changing.
Excuse, now I have only a few hours per a week for playing with my ps3, do not expect the fast progress from me.
PS: Glaurung, you have very nice coding style. And please, try blend for blits. You need only this thing for 2d driver, just do it! |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Wed Oct 17, 2007 4:06 am Post subject: I am stupid stupid stupid |
|
|
| The fail of blit with the offset 254 * 1024 * 1024 + 0x20 is just an alignment issue. The blit with the offset 0x20 fails also... |
|
| Back to top |
|
 |
ralferoo
Joined: 03 Mar 2007 Posts: 122
|
Posted: Wed Oct 17, 2007 4:37 pm Post subject: |
|
|
| unsolo wrote: | | I have started some work on the spu Xv as this will not be just limited to PS3 but will also work on any future FB CELL's and will likely work for any ppc + spursengine. |
| Glaurung wrote: | | unsolo, it could be interesting to have EXA acceleration provided by the SPU too, if you want to add that to the medialib (alpha blending, format convertion, offloading of large memory transfer and rectangular fills to the SPU, etc..). This is a bit offtopic though.. |
I have already written an SPU alpha blend function as part of my python-ps3 project. SPU code is here: http://python-ps3.svn.sourceforge.net/viewvc/python-ps3/trunk/library/spu/blend.c?revision=466&view=markup,
project is here: http://python-ps3.sourceforge.net/
There's still a bit more optimisation that can be done, like vectorising the load and store at the beginning and end (it was done this way for pixel alignment reasons) and unrolling could remove a few more stall cycles, but it's probably close to optimal already for simple alpha blitting.
I'm due to rewrite a large chunk of my library soon to avoid all the naive reading and writing from the screen, so that screen operations become write only and the entire task of rendering is delegated to the SPU instead of the PPC managing the process.
I also spent some time thinking about to do scaling and rotation, bearing in mind the limited memory of an SPU. I have some plans for that, but no time to start on that yet...
I'm also very looking forward to using some of this RSX knowledge to add more features to my library as I'll then be able to reclaim back the SPUs to do cool stuff with!
[edit]Just a thought. You probably won't want the opacity stuff (applying an overall alpha value to the entire blit), so the previous version is probably more useful: http://python-ps3.svn.sourceforge.net/viewvc/python-ps3/trunk/library/spu/blend.c?revision=43&view=markup |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Wed Oct 17, 2007 9:51 pm Post subject: EXA accelerated driver |
|
|
the problem with nouveau and composite driver is very simple.
2D blits only support ONE_MINUS_SRC_ALPHA, SRC_ALPHA blend mode.
EXA driver uses premultiplied alpha and needs ONE_MINUS_SRC_ALPHA, ONE blend mode. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Sat Oct 20, 2007 6:44 am Post subject: lv1_gpu_context_attribute research |
|
|
I worked with lv1_gpu_context_attribute functions. Looks like subfunctionts of this function do not insert any objects into RAMHT. Just some random 2D stuff. Need to retest.
Probably it is possible to use "RAMIN surgery". To tweak headers of alerady created objects. But I do not want to hack RAMIN in any way.
We need fresh ideas.
I want to play with the second parameter of lv1_gpu_context_allocate ( zero by default ), all objects are created in this HV call. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Sat Oct 20, 2007 8:12 am Post subject: searching too... |
|
|
Hi,
I've been trying a few random gpu_context_attribute() calls too with no real results either so far. The attributes 0x106 and 0x107 seems to be doing some stuff with the RAMFC, apart from that nothing interesting. I'm going through potential subprogID, that ps2devman mentioned (if that FIFO trap command still exists...), to see if there is a mean to write the MMIO registers using FIFO commands, but the chances are small.
Also, I just came across this page:
http://wiki.ps2dev.org/ps3:hypervisor:lv1_gpu_device_map
and think it would be interesting to map all 'gpu devices' regions and see if they are affected by the gpu_context_attribute() calls. I don't think any of the dumps look like NV MMIO but it is worth checking. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Sat Oct 20, 2007 9:02 pm Post subject: |
|
|
Glaurung, are you able to test 0x001, 0x002, 0x003 and 0x201, 0x202, 0x203 entries?
These functions work well on the cpu side, but RSX hangs up. I can not use RAMIN-blit-trick to watch RAMIN changes for these subfuntions. |
|
| Back to top |
|
 |
tmaster
Joined: 21 Oct 2005 Posts: 13 Location: Ireland
|
Posted: Sun Oct 21, 2007 3:24 am Post subject: |
|
|
I found this over on ps3news.
It is somthing to do RSX data dumps it has been out a while now. maybe you can do somthing with or maybe not.
Here what it says
| Code: | I have released a dump of the PlayStation 3 Graphic Libraries for the RSX Hardware in the PS3, available in iRC EFNet #PS3News.
These are not machine readable, they are human readable.
They are being released because there is a wealth of information in there, and hacking NVidia GPU's is a specialist field. Hopefully the appropriate experts can take these files and provide necessary information to allow full RSX access from under linux. I am quite confident this is possible from my cursory scan of this data and the functions that exist in the PS3 Hypervisor. (Especially the undocumented RSX Register access functions).
In any event I will continue to work on this area.
Note, these files are from code that runs UNDER GameOS. Linux does not run under GameOS, it runs directly under the Hypervisor, so there are no Hypervisor calls in this data.
The data certainly reveals the structure of critical memory structures, register layout, etc.
Enjoy.
PS, If you don't understand what these files are or show, then you are not the target audience (sorry). |
Ps: Keep up the good work ps3 devs.
Last edited by tmaster on Sun Oct 21, 2007 6:49 am; edited 1 time in total |
|
| Back to top |
|
 |
ooPo Site Admin
Joined: 17 Jan 2004 Posts: 2032 Location: Canada
|
Posted: Sun Oct 21, 2007 4:10 am Post subject: |
|
|
tmaster,
We like to promote legal and open development here. That means no peeking at licensed code or taking any information from any source that requires an NDA to do so.
Please don't post links to where people can find this information. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Sun Oct 21, 2007 6:54 am Post subject: |
|
|
IronPeter, the 0x201 entry works when called from the ps3fb context, but does nothing interesting. However, when called from different context (calling lv1_gpu_memory_allocate(), lv1_gpu_context_allocate() again from another kernel module for example), it does some initialization stuff. Indeed new DMA and graphics objects are created in RAMIN (the same set as we are used too, but for engine '9' instead of '1'..). I have not yet looked into the details. Also, call 0x001 works too when called from a second context. The 0x001, 0x201, and 0x400 calls seem to do the same kind of initialization (maybe they are operation that require a GPU context switch, causing the HV to do the context initialization for us, I don't know).
Also, did you notice the driver_info structure contains the object handles, it actually looks like this to me:
| Code: |
struct gpu_driver_info {
u32 version_driver;
u32 version_gpu;
u32 memory_size;
u32 hardware_channel;
u32 nvcore_frequency;
u32 memory_frequency;
u32 reserved1[16];
u32 dma_obj[256];
u32 gfx_obj[256];
u32 notify_obj[256];
u32 unk_obj[256];
u32 reserved2[23];
struct display_head display_head[8];
struct gpu_irq irq;
};
|
A second structure is filled at offset 0x4000 when any of the lv1_gpu_context_attribute mentioned above is called with the second context handle. Reading more than 32kB of driver info crashed (2 contexts) but the definition in Linux is 128 kB which would make 8 contexts. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Sun Oct 21, 2007 3:51 pm Post subject: |
|
|
Glaurung, very strange...
I of course tried all entries with the newly created context also. I was unable to note any changes in the GPU control structures in RAMIN. I wanted to find something like
lv1_gpu_context_attribute( u64 context_handle,
LV1_CREATE_GPU_OBJECT, u64 object_class, u64 object_handle );
Nothing happens in RAMIN or RSX hangs up.
Good note about gpu_driver_info. It looks like driver-side copy of gpu objects information ( nouveau drm uses the same one ). |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Sun Oct 21, 2007 6:09 pm Post subject: context data |
|
|
Hi, I compared RAMIN dump form the offset 0x78000 dwords with
http://gitweb.freedesktop.org/?p=mesa/drm.git;a=blob;h=7ce4273ddc3234e3d55ed75e6e3b4f395fe6bc7e;hb=2c5c18fbd394f419a9cf650720a1187440c643cd;f=shared-core/nv40_graph.c
Compare
| Code: |
47 /*TODO: deciper what each offset in the context represents. The below
48 * contexts are taken from dumps just after the 3D object is
49 * created.
50 */
...
116 /* 0x680-0x6BC - NV30_TCL_PRIMITIVE_3D_TX_ADDRESS_UNIT(0-15) */
117 /* 0x6C0-0x6FC - NV30_TCL_PRIMITIVE_3D_TX_FORMAT_UNIT(0-15) */
118 for (i=0x006C0; i<=0x006fc; i+=4)
119 INSTANCE_WR(ctx, i/4, 0x00018488);
120 /* 0x700-0x73C - NV30_TCL_PRIMITIVE_3D_TX_WRAP_UNIT(0-15) */
121 for (i=0x00700; i<=0x0073c; i+=4)
122 INSTANCE_WR(ctx, i/4, 0x00028202);
|
and this fragment:
offset(dword) : value
781ac 18488
781ad 18488
781ae 18488
781af 18488
781b0 18488
781b1 18488
781b2 18488
781b3 18488
781b4 18488
781b5 18488
781b6 18488
781b7 18488
781b8 18488
781b9 18488
781ba 18488
781bb 18488
781bc 28202
781bd 28202
781be 28202
781bf 28202
781c0 28202
781c1 28202
781c2 28202
781c3 28202
781c4 28202
781c5 28202
781c6 28202
781c7 28202
781c8 28202
781c9 28202
781ca 28202
781cb 28202
781cc 0
Pretty nice. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Sun Oct 21, 2007 11:29 pm Post subject: updated ps3gpu test program |
|
|
Hi,
I've updated the ps3gpu test program, to include RAMIN analysis, GPU devices region dumps, and driver_info and reports region dumps. The ps3fb.diff patch is also updated:
http://manwe.homelinux.org/~glaurung/ps3/20071021/
IronPeter, with this code, I observe a new objects being created with the 0x201 call (check the log_pre.txt and log_post.txt files produced before and after inserting the module inside ps3gpu.c). Commenting out the insmod and using an ioctl to perform the same call from within ps3fb using its context does not produce anything interesting. Note, the creation of new objects only happen the first time any of the 0x001, 0x201 or 0x400 call is used from the second context.
Also, there is indeed a graph context at the very end of VRAM (0x0ffe0000). This can be verified because its first dword is its own offset within RAMIN (0x0ff80000) shifted 4, that is 0x6000. |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Mon Oct 22, 2007 12:03 am Post subject: |
|
|
Glaurung, solid work.
>>I observe a new objects being created with the 0x201 call
Of course, I'll check it. Thanks. What is about object types? 2D related? Just want to know.
PS: context seems to be setted up for 3D properly. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Mon Oct 22, 2007 12:48 am Post subject: |
|
|
| IronPeter, that's the same DMA and GFX objects we observed already, they are just allocated in RAMIN a second time (after the first set). Also, the handles are the same but hash to a different entry in RAMHT, so I suspect the channel (not subchannel) to be different, as would be required for hardware context switching. The objects we know all use channel 0, I'll hash the handles with other channel values and see how to produce the hashes I observe for the second set of objects. Another difference is the engine id part of the RAMIN address in the hash table entry, which is 9 instead of 1. |
|
| Back to top |
|
 |
Glaurung
Joined: 11 Oct 2007 Posts: 49
|
Posted: Mon Oct 22, 2007 12:59 am Post subject: |
|
|
| Just checked with a few handles, the new object set indeed use hardware channel 1 instead of 0. No news of any 3D object so far (we're looking for class 0x97). Also, I was thinking it would be nice to start a wiki page on all this, in a more human readable form :-), what do you think? |
|
| Back to top |
|
 |
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Mon Oct 22, 2007 1:08 am Post subject: |
|
|
Aaah, the second copy of 2D objects just with the different hashes... Yes, I noticed that copy. Creation of these objects is not lazy. They are created in alloc_context call.
The wiki page is great idea. The only problem is my English. If somebody with native English will edit this page... |
|
| Back to top |
|
 |
|