The hunt for HV's FIFO/Push buffer...

Technical discussion on the newly released and hard to find PS3.

Moderators: cheriff, emoon

Post Reply
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

I was able to run blit push buffer from the user land using fifo control regs.

There was some kind of protection. Very weak protection.

It works unstable for now, but it does work. Probably, it's possible to write some kind of 2D support ( stretched blits, color fills, etc ).

The main question is about 3D support. We need so-called "context objects" to be properly initalized. Probably, hypervisor does this work for us. All we need are handles ( and lpar_dma_reports contains something that looks like this handles ). To initialize these objects "by hands" we need to access to very special RSX registers, so called RAMIN area.

PS. I investigate RSX with only open-source information. I have no signed NDA with Sony or NVidia.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Ok, things are getting serious now... Hehe.
What is your firmware version?

We need to be careful and detect when this new trick will become unusable in future firmware versions. Someone with infectus and the ability to swap firmware would be the best person for detecting such infamous change...

Thanks for your great work, IronPeter!

PS: If you could publish a minimal source, even unstable, that can be used to test this new trick for each firmware version that would be great! Thanks!
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

The firmware version is 1.8

The trick is very simple, I can describe it without posting the full sources.

Look at the push buffer dump:

http://www.everfall.com/paste/id.php?uxdlpwlbfpo9

It is the image of push buffer after hypervisor blit. The end of buffer is at 0xb8.

The last packet sends zero to the subchannel zero with tag 0x110. Replace the tag with NOP ( 0x100 ) while buffer is kicked by the hypervisor and is executing with the RSX.

Fill push buffer with N + 1 copies of the first 0xb8 bytes.

In cycle for( int i = 0; i < N; ++i ) modify client screen buffer in some way ( fill with random numbers ), kick push buffer via writing ( (uint32_t *)ioremap( lpar_dma_control, 1024) )[ 0x10 ] = 0xe1f0000 + 0xb8 * ( i + 1 ); sleep ( 1 ); The client screen in the xdr memory will blits in the videomemory.

If nobody is unable to repeat these steps I post the full sources.

Edited: bugfix
Last edited by IronPeter on Sun Oct 07, 2007 10:51 pm, edited 1 time in total.
User avatar
mc
Posts: 211
Joined: Wed Jan 12, 2005 7:32 am
Location: Linköping

Post by mc »

Very nice work, IronPeter!

I find it intriguing that not only are you allowed to define the FIFO region
in user memory, but that you are also allowed to map the control area.
This suggests to me that Sony actually intended to support HW 3D under
Linux, as they could just as easily have made the control area accessible
only to the hypervisor.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

mc wrote: ... but that you are also allowed to map the control area.
We are allowed to map only part of mmio register. Only context control registers. I do not know the way to map the global RAMIN area.

The good source about this stuff:
http://nouveau.freedesktop.org/wiki/HonzaHavlicek
User avatar
mc
Posts: 211
Joined: Wed Jan 12, 2005 7:32 am
Location: Linköping

Post by mc »

Well, yeah, but that only makes it more likely that it is intentional
that this particular part can be mapped. It would make sense
to only expose the parts needed for performance (= FIFO interface)
and handle access to other parts through the HV.

Thanks for the link, BTW.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Yes, mc, you are right.

The only thing that needs direct FIFO access is real time 3D acceleration. 2D part does not need that. So Sony probably wants to expose 3D driver.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Complete subchannel mapping

Post by IronPeter »

It's push buffer dump after hypervisor FB_SETUP. Programmers from Sony decided not to clean up push buffer after context objects set up.

http://www.everfall.com/paste/id.php?ew29498z816w

Enjoi it.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

confirmed FIFO hack

Post by Glaurung »

Hi all,

I could reproduce the FIFO hack described by IronPeter, using firmware v1.93. I had to setup a large blit (which is decomposed into many 1024x1024 blits by the HV) in order to have time to tweak the FIFO area. Instead of patching with a NOP (which works), I chose to set the FIFO end pointer two operations back. I also had to remove the L1GPU_FB_BLIT_WAIT_FOR_COMPLETION flag from the call to lv1_gpu_context_attribute(), so to sum it up:

Code: Select all

        /* large blit */
	lv1_gpu_context_attribute&#40;ps3.context_handle,
					   L1GPU_CONTEXT_ATTRIBUTE_FB_BLIT,
					   dst_offset,
					   GPU_IOIF + src_offset,
					   &#40;1ULL << 31&#41; |
					   &#40;1280 << 16&#41; | 1280,
					   1280*4&#41;;

        /* go back two operations */
	ps3.fifo_regs&#91;0x10&#93; -= 8;

        /* wait for end of GPU operation */
	while &#40;ps3.fifo_regs&#91;0x11&#93; != ps3.fifo_regs&#91;0x10&#93; &&
	       ps3.fifo_regs&#91;0x15&#93; != ps3.fifo_regs&#91;0x10&#93;&#41;;

        /* copy our operation to the fifo */
	memcpy&#40;&ps3.fifo&#91;fifo_idx&#93;, blit_program, sizeof&#40;blit_program&#41;&#41;;

        /* fill the vram in white */
	memset&#40;ps3.vram, 0xff, ps3.vram_size&#41;;

        /* kick off the GPU */
	ps3.fifo_regs&#91;0x10&#93; += sizeof&#40;blit_program&#41;;
I was also able to send various other blit commands in the FIFO, using documentation from the nouveau project. For example, the following FIFO commands will do a YUYV blit instead of a ARGB blit:

Code: Select all

uint32_t blit_program&#91;&#93; = &#123;
	0x00106300, // SURFACE_FORMAT &#40;size 4, subchannel 3&#41;
	0x0000000a, //  SURFACE_FORMAT_A8R8G8B8
	0x14001400, //  &#40;&#40;pitch&#123;dst&#125; << 16&#41; | pitch&#123;dst&#125;&#41;
	0x00000000, //  src_offset
	0x00000000, //  dst_offset

	0x0024c2fc, // NV_IMAGE_BLIT_OPERATION &#40;size 9, subchannel 6&#41;
	0x00000001, //  not dither &#40;0 = dither&#41;
	0x00000005, //  STRETCH_BLIT_FORMAT_YUYV
	0x00000003, //  STRETCH_BLIT_OPERATION_COPY
	0x00000000, //  &#40;dstX << 16 | dstY&#41;
	0x02d00400, //  &#40;&#40;height << 16&#41; | width&#41;
	0x00000000, //  &#40;dstX << 16 | dstY&#41;
	0x02d00400, //  &#40;&#40;height << 16&#41; | width&#41;
	0x00100000, //  step_x in 12.20 fixed point
	0x00100000, //  step_y in 12.20 fixed point

	0x0010c400, // STRETCH_BLIT_SRC_SIZE &#40;size 4, subchannel 6&#41;
	0x02d00400, //  &#40;&#40;height << 16&#41; | width&#41;
	0x00021400, //  pitch_src
		    //    | &#40;STRETCH_BLIT_SRC_FORMAT_ORIGIN_CORNER << 16&#41;
		    //    | &#40;STRETCH_BLIT_SRC_FORMAT_FILTER_POINT_SAMPLE << 24&#41;
	0x0d000000, //  GPU_IOIF + src_offset
	0x00000000, //  srcX | &#40;srcY<<16&#41;
&#125;;
Thanks IronPeter for your hard work. I'll now try playing a bit with subchannel bindings, your most recent post looks quite interesting.
Seather
Posts: 1
Joined: Thu Oct 11, 2007 2:13 pm

Post by Seather »

Would this hypervisor be anything close to the Xen from Sun?
jimparis
Posts: 1145
Joined: Fri Jun 10, 2005 4:21 am
Location: Boston

Post by jimparis »

1) Xen isn't from Sun, it's from XenSource.
2) This thread has nothing to do with that, please keep it on-topic.
3) It is the same in theory but there are no similarities that are useful to us.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

few words about methodology

Post by IronPeter »

Glaurung, nice.

I want to say few words about testing methodology. Segmentation fault in the kernel mode is not very good. I made few things to make life easier:

1.) I extended ps3fb's memory mapping for the last 65536 bytes. open( "/dev/fb0" ), mmap it, use push buffer in the client mode.

2.) I extended ps3fb's ioctl for the fifo control registers read/write.

3.) lv1_gpu_context_intr ( interruption ) is very useful for you. Extend ioctl with this function. And it is better to disable (also by ioctl) regular kernel's lv1_gpu_context_intr in the driver.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, would you like to test blit from videomemory to the system area? I have no access for ps3 for a short time.

http://wiki.ps2dev.org/ps3:hypervisor:l ... y_allocate returns only 252 megs, the top of the videomemory seems to contatin RAMIN area ( the global GPU control area ).

This area must be protected from read/write, but probably...
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Also, blitting from the vidmem can be useful for using ddr memory as fast swap-file ( CPU readings are very slooooow ).
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

FB_SETUP and XDR<->DDR DMA

Post by Glaurung »

Hi,

First of all, let's start with some news on the information left over by FB_SETUP. The dump from IronPeter shows that there are 6 objects bound to channels 1-6:

- channel 1: instance 0x31337303 of class NvMemFormat. It is used for uploading data from XDR (DMA object 0xfeed0001) to DDR (DMA object 0xfeed0000)
- channel 2: instance 0x3137c0de of class NvMemFormat. It is used for downloading data from DDR to XDR
- channel 3: instance 0x313371c3 of class NvContextSurfaces. This describes the screen surface parameters (ARGB, ...) and is referenced by other objects.
- channel 6: instance 0x3137af00 of class NvScaledImage. This is used for blitting, along with the NvContextSurfaces object above

That's the ones I sure of.. Then we have:
- channel 5: instance 0x31337808 of unknown class, maybe NvImageFromCpu or NvImageBlit
- channel 4: instance 0x31337a73 of unknown class, no idea what it is.
and 0x66604200 is most probably an instance of a DMA Notify object.

Binding an existing object to another channel using tag 0 works. Using this, I was able to perform DMA from DDR to XDR and vice-versa. For example, put this in the FIFO for download:

Code: Select all

	0x0020430c, // &#40;size = 8, chan 2&#41;
	0x00000000, // 0x30c&#58; src
	0x0d000000, // 0x310&#58; dst &#40;GPU_IOF&#41;
	0x00004000, // 0x314&#58; src_pitch
	0x00004000, // 0x318&#58; dst_pitch
	0x00004000, // 0x31c&#58; src_line_len
	0x00000400, // 0x320&#58; line_count
	0x00000101, // 0x324&#58; &#40;&#40;dst_inc << 8&#41; | src_inc&#41;
	0x00000000, // 0x328&#58; buf_notify &#40;?&#41;
Setting src to anything above 252MB crashes the GPU.. no luck for the RAMIN area, sorry IronPeter.. I did a few other random tests though:
- Extending the ioremapping of the the framebuffer, I was able to read 2 more MB of memory past 252MB. Does not look very interesting, 64k of ff ff ff xx then 64k of 00 00 00 xx, etc... No idea what it might be
- I tried the same trick on the fifo registers. Reading returns zero (except for the three FIFO registers of course) until you reach 64k < addr < 128k which crashes the PS3 with a nice beep and blinking red led (just by reading!!). I guess the HV is not happy... need to power cycle to fix the condition
- Same thing on the reports buffer. It starts with:
0x0000: 13 37 c0 d3 13 37 ba be 13 37 be ef 13 37 f0 01, etc..
(looks like some guys at Sony or IBM have humor...)
0x1400: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff, etc..
0x9400: 00 04 00 00 20 00 04 10 00 5a 02 22 00 81 14 01
40 00 20 04 00 88 00 24 44 00 01 44 02 02 04 01
c0 00 0a 08 00 20 20 00 00 02 08 81 02 10 00 00
01 41 08 53 21 04 04 00 00 24 00 00 00 05 10 20
There seem to be real data starting from 0x9400 up, but I don't know what that could be either...
- I played a bit with the values of lv1_gpu_memory_allocate(). The four values set to zero are actually refering to resources, probably two memory resources and 2 other resources. Here are the maximum values I could set before the call returns invalid parameters (-17):
status = lv1_gpu_memory_allocate(ps3fb.vram_size,
512*1024,
3075*1024,
15,
8,
&ps3fb.memory_handle,
&ps3fb.vram_lpar);

Anyway, we now have everything needed to write a decent Xorg driver, with Xv and Composite support. Also, the mtd driver for swapping RAM to DDR could be improved using DMA (I measured bandwith of ~3GB/s in both directions which is far less than expected but much better than direct access to DDR; blitting is faster, at ~16GB/s).

Have fun.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Good work, Glaurung

Idea for getting RAMIN must fail. The direct RAMIN access means HUGE security hole in the hypervisor.

I supposed subchannel 5 has CLASS_3D :).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

very strange blit result

Post by IronPeter »

Probably it is my mistake, but this DDR blit into itself does really work:

http://www.everfall.com/paste/id.php?wpt5ez8wbvpq

Ye, it looks like white-black regular areas with irregular color dots :).

Edited: XDR->DDR
Last edited by IronPeter on Fri Oct 12, 2007 2:28 pm, edited 1 time in total.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Dump of the top of videomemory

Post by IronPeter »

yes, I have dump of the top 4 megs of vidmem. This dump contains for example the funny "dma_report" area in the middle. And handles for the context objects ( in some endianness ) 313371c3 -> c3713331.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

@IronPeter

Im working on spu medialib and we have made a yv12/yuv420 scaler that also does colorspace conversion at more then sufficient framereates 85FPS @1920x1080 using a single spe. we have also worked on a mplayervo using spu-medialib and libsp3fb a small fb lib we made.

if we could figure out how to blit the video over as an overlay to the X beeing rendered it would be great. maybe by extending libps3fb somehow.
Don't do it alone.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

I would also be more than happy to write other 2d acellerations for the spu like
yuyv perhaps.
Don't do it alone.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

Hi,

IronPeter: Great! I didn't try the blit, just the XDR<->DDR DMA, so bliting DDR to DDR was a good idea and you get basically the same as I was observing with remapping (white/black stuff) plus the last 2MB. The fact that the object handles lie at the end of the framebuffer is quite interesting (hash table?), I'll have a look at that tomorrow. However I doubt channel 5 is 3D, but let's hope for it :-)

unsolo: Blitting YUYV to RGB with scaling is clearly feasible, as I have reported in my first post. Actually, I'm not a 3D guy at all and I'm more interested in getting a decent Xorg driver for the PS3. This means accelerated Xv support and Composite support too (as we now have all the tools to do that I think). Basically, changing the strech blit format from 3 to 5 does the job. Converting YUV420 to YUYV is relatively straightforward and should not take too much CPU power, leaving the SPUs for the applications. However, until we find out how to create new GPU objects, using the SPUs for 3D might be a good option. I did actually start a FB+SPU based Xorg driver, thinking there was no hope for direct GPU access, but the experiments of the last days show otherwise and are quite encouraging. Anyway, a SPU can clearly handle YUV->RGB conversion at full resolution alone, so this might be an alternative for people not interested in messing up with their framebuffer driver.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

unsolo, I am a 3D guy. Glaurung is the 2D guy you need.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

@Glarung please take a look here

http://wiki.ps2dev.org/ps3:spu-medialib

As you can se we not only got the colorspace into a single spe but also a scaler into the same one.

I am more than happy to extend these to whatever is needed to make some XV driver and using the stuff we have in the mplayer-vo that shouldnt be to hard.

But my knowledge of xv/Xorg is what comes short. If you could/want to assist on this it would be very much appriciated.
Don't do it alone.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

> but also a scaler into the same one

Not a problem, stretched blits are possible.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

Anyhow i think making a XV driver based on spu scaling + csc + blit is stable and safe. Untill we se the responses to these developments.
Don't do it alone.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

Hi,

For the SPU based approach, you might be interested by this:
git clone http://manwe.homelinux.org/~glaurung/xf86-video-ps3.git

This is basically just a standard FB Xorg driver, on which I have added a simple "Hello world" from a SPU. You can plug your media library instead, just ignore the warning from libtool, that's expected. Note: it is not usable as it exits purposedly on startup. I have not worked on this driver since a month or so (was away), but I'll probably start from it as the base for the GPU accelerated version too.
Also, I am not a Xorg expert (mostly read the FB and nouveau code for now), but I have been told that implementing Xv and Composite is mostly done with a few functions by someone who did that for an embedded platform.
I agree that an SPU-based driver is safer and also simpler to install (just need to change the Xorg driver, not the kernel). So if you want to give it a try, no problem. I'll focus on the GPU accelerated one though (done mostly cleanup of the kernel code today).
Nismobeach
Posts: 9
Joined: Thu Aug 16, 2007 1:31 pm

Post by Nismobeach »

This is fantastic news guys! Keep up the great work and thanks for helping PS3 Linux grow! :)
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

RAMHT dump

Post by IronPeter »

The top of vidmem is definitely RAMIN memory.

RAMHT contains instances of context objects:

handle : dword object HEADER

3137af00 04983089 //NV10_SCALED_IMAGE_FROM_MEMORY
56616661 00003002
56616660 0000303d
66626660 4000303d
66616661 00003002
66606660 0000303d
31337a73 0000309e // NV20_SWIZZLED_SURFACE
31337303 02000039 // NV_MEMORY_TO_MEMORY_FORMAT
cafebabe cafebabe
feed0001 0002303d
feed0000 0000303d
31337808 0418308a //NV_IMAGE_FROM_CPU
31337000 00000030
3137c0de 02000039// NV_MEMORY_TO_MEMORY_FORMAT
313371c3 00003062 //NV_CONTEXT_SURFACE
66604201 04003003
66604200 00003003
66604203 0c003003
66604202 08003003
66604205 14003003
66604204 10003003
66604207 1c003003
66604206 18003003
66604209 24003003
66604208 20003003
6660420b 2c003003
6660420a 28003003
6660420d 34003003
6660420c 30003003
6660420f 3c003003
6660420e 38003003

Enjoi.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Getting RAMIN via blitter is incredible thing.

It looks like getting OS kernel info via memcopy from the userland.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

My position is

1.) Sony must fix that security hole as soon as possible ( probably, this broken condition is already fixed in the new firmwares ).

2.) I do not want to modify RAMIN via inserting 3d objects. It is exploit.

3.) It is better for Sony to provide legal and safe hypervisor-level access to the GPU.
Post Reply