The hunt for HV's FIFO/Push buffer...

Technical discussion on the newly released and hard to find PS3.

Moderators: cheriff, emoon

Post Reply
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

push buffer is open for read-write

Post by IronPeter »

The new sources for ps3fb.c http://www.everfall.com/paste/id.php?kjjqgvpbncu0 contain GPU_CMD_BUF_SIZE macro.

It's clear that the memory region [ ps3fb_videomemory.address + ps3fb_videomemory.size - GPU_CMD_BUF_SIZE, ps3fb_videomemory.address + ps3fb_videomemory.size ) is FIFO buffer.

This region dump: http://www.everfall.com/paste/id.php?ieto25cyoy0g

PS. excuse my English.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Yes, it is realy push buffer.

The good idea is to test all undocumented entries from http://wiki.ps2dev.org/ps3:hypervisor:l ... _attribute , get push buffer dump for the each one and compare with nouveau NV40 push buffer database.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

fresh news about fifo control regs.

There is lpar_dma_control area. It can be iomapped. This area is filled with zeroes, only 3 dwords are non-zeroes. dwords with indices 0x10, 0x11, 0x15.

The value of dword 0x11 just after hypervisor blit is 0xe1f0000, few ms later is 0xe1f0048 and 0xe1f00b8 finally.

compare with
http://nouveau.cvs.sourceforge.net/nouv ... iew=markup

Code: Select all

  711 	printf("FIFO put=0x%08x, get=0x%08x\n",
  712 			fifo_regs[0x40/4],
  713 			fifo_regs[0x44/4]
  714 			);
  715 	FIRE_RING();
  716 	sleep(1);
  717 	printf("FIFO put=0x%08x, get=0x%08x\n",
  718 			fifo_regs[0x40/4],
  719 			fifo_regs[0x44/4]
  720 			);
Edited: constants
Last edited by IronPeter on Sun Oct 07, 2007 9:34 pm, edited 1 time in total.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

funny push buffer disassembling

Post by IronPeter »

I've parsed blit push buffer into dma packets ( size of packet, subchannel id, tag ):

http://www.everfall.com/paste/id.php?z51ttk37j71s

My screen resolution is 1280 x 1024 ( 0x500 x 0x400 ).
Funny, hypervisor made 2 blits ( one 1024 x 1024 and one 256 x 1024 ), so parameters of hypervisor call are not "preformated for th GPU".

You can refer this document relating push buffer tags:

http://gitweb.freedesktop.org/?p=nouvea ... veau_reg.h
jimparis
Posts: 1145
Joined: Fri Jun 10, 2005 4:21 am
Location: Boston

Post by jimparis »

Nice work, this is good stuff.
Warren
Posts: 175
Joined: Sat Jan 24, 2004 8:26 am
Location: San Diego, CA

Re: funny push buffer disassembling

Post by Warren »

IronPeter wrote: My screen resolution is 1280 x 1024 ( 0x500 x 0x400 ).
Funny, hypervisor made 2 blits ( one 1024 x 1024 and one 256 x 1024 ), so parameters of hypervisor call are not "preformated for th GPU".
Got to love power of 2 sized textures.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Congratulations IronPeter! Smells good, very good!
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

I was able to run blit push buffer from the user land using fifo control regs.

There was some kind of protection. Very weak protection.

It works unstable for now, but it does work. Probably, it's possible to write some kind of 2D support ( stretched blits, color fills, etc ).

The main question is about 3D support. We need so-called "context objects" to be properly initalized. Probably, hypervisor does this work for us. All we need are handles ( and lpar_dma_reports contains something that looks like this handles ). To initialize these objects "by hands" we need to access to very special RSX registers, so called RAMIN area.

PS. I investigate RSX with only open-source information. I have no signed NDA with Sony or NVidia.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Ok, things are getting serious now... Hehe.
What is your firmware version?

We need to be careful and detect when this new trick will become unusable in future firmware versions. Someone with infectus and the ability to swap firmware would be the best person for detecting such infamous change...

Thanks for your great work, IronPeter!

PS: If you could publish a minimal source, even unstable, that can be used to test this new trick for each firmware version that would be great! Thanks!
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

The firmware version is 1.8

The trick is very simple, I can describe it without posting the full sources.

Look at the push buffer dump:

http://www.everfall.com/paste/id.php?uxdlpwlbfpo9

It is the image of push buffer after hypervisor blit. The end of buffer is at 0xb8.

The last packet sends zero to the subchannel zero with tag 0x110. Replace the tag with NOP ( 0x100 ) while buffer is kicked by the hypervisor and is executing with the RSX.

Fill push buffer with N + 1 copies of the first 0xb8 bytes.

In cycle for( int i = 0; i < N; ++i ) modify client screen buffer in some way ( fill with random numbers ), kick push buffer via writing ( (uint32_t *)ioremap( lpar_dma_control, 1024) )[ 0x10 ] = 0xe1f0000 + 0xb8 * ( i + 1 ); sleep ( 1 ); The client screen in the xdr memory will blits in the videomemory.

If nobody is unable to repeat these steps I post the full sources.

Edited: bugfix
Last edited by IronPeter on Sun Oct 07, 2007 10:51 pm, edited 1 time in total.
User avatar
mc
Posts: 211
Joined: Wed Jan 12, 2005 7:32 am
Location: Linköping

Post by mc »

Very nice work, IronPeter!

I find it intriguing that not only are you allowed to define the FIFO region
in user memory, but that you are also allowed to map the control area.
This suggests to me that Sony actually intended to support HW 3D under
Linux, as they could just as easily have made the control area accessible
only to the hypervisor.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

mc wrote: ... but that you are also allowed to map the control area.
We are allowed to map only part of mmio register. Only context control registers. I do not know the way to map the global RAMIN area.

The good source about this stuff:
http://nouveau.freedesktop.org/wiki/HonzaHavlicek
User avatar
mc
Posts: 211
Joined: Wed Jan 12, 2005 7:32 am
Location: Linköping

Post by mc »

Well, yeah, but that only makes it more likely that it is intentional
that this particular part can be mapped. It would make sense
to only expose the parts needed for performance (= FIFO interface)
and handle access to other parts through the HV.

Thanks for the link, BTW.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Yes, mc, you are right.

The only thing that needs direct FIFO access is real time 3D acceleration. 2D part does not need that. So Sony probably wants to expose 3D driver.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Complete subchannel mapping

Post by IronPeter »

It's push buffer dump after hypervisor FB_SETUP. Programmers from Sony decided not to clean up push buffer after context objects set up.

http://www.everfall.com/paste/id.php?ew29498z816w

Enjoi it.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

confirmed FIFO hack

Post by Glaurung »

Hi all,

I could reproduce the FIFO hack described by IronPeter, using firmware v1.93. I had to setup a large blit (which is decomposed into many 1024x1024 blits by the HV) in order to have time to tweak the FIFO area. Instead of patching with a NOP (which works), I chose to set the FIFO end pointer two operations back. I also had to remove the L1GPU_FB_BLIT_WAIT_FOR_COMPLETION flag from the call to lv1_gpu_context_attribute(), so to sum it up:

Code: Select all

        /* large blit */
	lv1_gpu_context_attribute&#40;ps3.context_handle,
					   L1GPU_CONTEXT_ATTRIBUTE_FB_BLIT,
					   dst_offset,
					   GPU_IOIF + src_offset,
					   &#40;1ULL << 31&#41; |
					   &#40;1280 << 16&#41; | 1280,
					   1280*4&#41;;

        /* go back two operations */
	ps3.fifo_regs&#91;0x10&#93; -= 8;

        /* wait for end of GPU operation */
	while &#40;ps3.fifo_regs&#91;0x11&#93; != ps3.fifo_regs&#91;0x10&#93; &&
	       ps3.fifo_regs&#91;0x15&#93; != ps3.fifo_regs&#91;0x10&#93;&#41;;

        /* copy our operation to the fifo */
	memcpy&#40;&ps3.fifo&#91;fifo_idx&#93;, blit_program, sizeof&#40;blit_program&#41;&#41;;

        /* fill the vram in white */
	memset&#40;ps3.vram, 0xff, ps3.vram_size&#41;;

        /* kick off the GPU */
	ps3.fifo_regs&#91;0x10&#93; += sizeof&#40;blit_program&#41;;
I was also able to send various other blit commands in the FIFO, using documentation from the nouveau project. For example, the following FIFO commands will do a YUYV blit instead of a ARGB blit:

Code: Select all

uint32_t blit_program&#91;&#93; = &#123;
	0x00106300, // SURFACE_FORMAT &#40;size 4, subchannel 3&#41;
	0x0000000a, //  SURFACE_FORMAT_A8R8G8B8
	0x14001400, //  &#40;&#40;pitch&#123;dst&#125; << 16&#41; | pitch&#123;dst&#125;&#41;
	0x00000000, //  src_offset
	0x00000000, //  dst_offset

	0x0024c2fc, // NV_IMAGE_BLIT_OPERATION &#40;size 9, subchannel 6&#41;
	0x00000001, //  not dither &#40;0 = dither&#41;
	0x00000005, //  STRETCH_BLIT_FORMAT_YUYV
	0x00000003, //  STRETCH_BLIT_OPERATION_COPY
	0x00000000, //  &#40;dstX << 16 | dstY&#41;
	0x02d00400, //  &#40;&#40;height << 16&#41; | width&#41;
	0x00000000, //  &#40;dstX << 16 | dstY&#41;
	0x02d00400, //  &#40;&#40;height << 16&#41; | width&#41;
	0x00100000, //  step_x in 12.20 fixed point
	0x00100000, //  step_y in 12.20 fixed point

	0x0010c400, // STRETCH_BLIT_SRC_SIZE &#40;size 4, subchannel 6&#41;
	0x02d00400, //  &#40;&#40;height << 16&#41; | width&#41;
	0x00021400, //  pitch_src
		    //    | &#40;STRETCH_BLIT_SRC_FORMAT_ORIGIN_CORNER << 16&#41;
		    //    | &#40;STRETCH_BLIT_SRC_FORMAT_FILTER_POINT_SAMPLE << 24&#41;
	0x0d000000, //  GPU_IOIF + src_offset
	0x00000000, //  srcX | &#40;srcY<<16&#41;
&#125;;
Thanks IronPeter for your hard work. I'll now try playing a bit with subchannel bindings, your most recent post looks quite interesting.
Seather
Posts: 1
Joined: Thu Oct 11, 2007 2:13 pm

Post by Seather »

Would this hypervisor be anything close to the Xen from Sun?
jimparis
Posts: 1145
Joined: Fri Jun 10, 2005 4:21 am
Location: Boston

Post by jimparis »

1) Xen isn't from Sun, it's from XenSource.
2) This thread has nothing to do with that, please keep it on-topic.
3) It is the same in theory but there are no similarities that are useful to us.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

few words about methodology

Post by IronPeter »

Glaurung, nice.

I want to say few words about testing methodology. Segmentation fault in the kernel mode is not very good. I made few things to make life easier:

1.) I extended ps3fb's memory mapping for the last 65536 bytes. open( "/dev/fb0" ), mmap it, use push buffer in the client mode.

2.) I extended ps3fb's ioctl for the fifo control registers read/write.

3.) lv1_gpu_context_intr ( interruption ) is very useful for you. Extend ioctl with this function. And it is better to disable (also by ioctl) regular kernel's lv1_gpu_context_intr in the driver.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, would you like to test blit from videomemory to the system area? I have no access for ps3 for a short time.

http://wiki.ps2dev.org/ps3:hypervisor:l ... y_allocate returns only 252 megs, the top of the videomemory seems to contatin RAMIN area ( the global GPU control area ).

This area must be protected from read/write, but probably...
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Also, blitting from the vidmem can be useful for using ddr memory as fast swap-file ( CPU readings are very slooooow ).
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

FB_SETUP and XDR<->DDR DMA

Post by Glaurung »

Hi,

First of all, let's start with some news on the information left over by FB_SETUP. The dump from IronPeter shows that there are 6 objects bound to channels 1-6:

- channel 1: instance 0x31337303 of class NvMemFormat. It is used for uploading data from XDR (DMA object 0xfeed0001) to DDR (DMA object 0xfeed0000)
- channel 2: instance 0x3137c0de of class NvMemFormat. It is used for downloading data from DDR to XDR
- channel 3: instance 0x313371c3 of class NvContextSurfaces. This describes the screen surface parameters (ARGB, ...) and is referenced by other objects.
- channel 6: instance 0x3137af00 of class NvScaledImage. This is used for blitting, along with the NvContextSurfaces object above

That's the ones I sure of.. Then we have:
- channel 5: instance 0x31337808 of unknown class, maybe NvImageFromCpu or NvImageBlit
- channel 4: instance 0x31337a73 of unknown class, no idea what it is.
and 0x66604200 is most probably an instance of a DMA Notify object.

Binding an existing object to another channel using tag 0 works. Using this, I was able to perform DMA from DDR to XDR and vice-versa. For example, put this in the FIFO for download:

Code: Select all

	0x0020430c, // &#40;size = 8, chan 2&#41;
	0x00000000, // 0x30c&#58; src
	0x0d000000, // 0x310&#58; dst &#40;GPU_IOF&#41;
	0x00004000, // 0x314&#58; src_pitch
	0x00004000, // 0x318&#58; dst_pitch
	0x00004000, // 0x31c&#58; src_line_len
	0x00000400, // 0x320&#58; line_count
	0x00000101, // 0x324&#58; &#40;&#40;dst_inc << 8&#41; | src_inc&#41;
	0x00000000, // 0x328&#58; buf_notify &#40;?&#41;
Setting src to anything above 252MB crashes the GPU.. no luck for the RAMIN area, sorry IronPeter.. I did a few other random tests though:
- Extending the ioremapping of the the framebuffer, I was able to read 2 more MB of memory past 252MB. Does not look very interesting, 64k of ff ff ff xx then 64k of 00 00 00 xx, etc... No idea what it might be
- I tried the same trick on the fifo registers. Reading returns zero (except for the three FIFO registers of course) until you reach 64k < addr < 128k which crashes the PS3 with a nice beep and blinking red led (just by reading!!). I guess the HV is not happy... need to power cycle to fix the condition
- Same thing on the reports buffer. It starts with:
0x0000: 13 37 c0 d3 13 37 ba be 13 37 be ef 13 37 f0 01, etc..
(looks like some guys at Sony or IBM have humor...)
0x1400: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff, etc..
0x9400: 00 04 00 00 20 00 04 10 00 5a 02 22 00 81 14 01
40 00 20 04 00 88 00 24 44 00 01 44 02 02 04 01
c0 00 0a 08 00 20 20 00 00 02 08 81 02 10 00 00
01 41 08 53 21 04 04 00 00 24 00 00 00 05 10 20
There seem to be real data starting from 0x9400 up, but I don't know what that could be either...
- I played a bit with the values of lv1_gpu_memory_allocate(). The four values set to zero are actually refering to resources, probably two memory resources and 2 other resources. Here are the maximum values I could set before the call returns invalid parameters (-17):
status = lv1_gpu_memory_allocate(ps3fb.vram_size,
512*1024,
3075*1024,
15,
8,
&ps3fb.memory_handle,
&ps3fb.vram_lpar);

Anyway, we now have everything needed to write a decent Xorg driver, with Xv and Composite support. Also, the mtd driver for swapping RAM to DDR could be improved using DMA (I measured bandwith of ~3GB/s in both directions which is far less than expected but much better than direct access to DDR; blitting is faster, at ~16GB/s).

Have fun.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Good work, Glaurung

Idea for getting RAMIN must fail. The direct RAMIN access means HUGE security hole in the hypervisor.

I supposed subchannel 5 has CLASS_3D :).
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

very strange blit result

Post by IronPeter »

Probably it is my mistake, but this DDR blit into itself does really work:

http://www.everfall.com/paste/id.php?wpt5ez8wbvpq

Ye, it looks like white-black regular areas with irregular color dots :).

Edited: XDR->DDR
Last edited by IronPeter on Fri Oct 12, 2007 2:28 pm, edited 1 time in total.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Dump of the top of videomemory

Post by IronPeter »

yes, I have dump of the top 4 megs of vidmem. This dump contains for example the funny "dma_report" area in the middle. And handles for the context objects ( in some endianness ) 313371c3 -> c3713331.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

@IronPeter

Im working on spu medialib and we have made a yv12/yuv420 scaler that also does colorspace conversion at more then sufficient framereates 85FPS @1920x1080 using a single spe. we have also worked on a mplayervo using spu-medialib and libsp3fb a small fb lib we made.

if we could figure out how to blit the video over as an overlay to the X beeing rendered it would be great. maybe by extending libps3fb somehow.
Don't do it alone.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

I would also be more than happy to write other 2d acellerations for the spu like
yuyv perhaps.
Don't do it alone.
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

Post by Glaurung »

Hi,

IronPeter: Great! I didn't try the blit, just the XDR<->DDR DMA, so bliting DDR to DDR was a good idea and you get basically the same as I was observing with remapping (white/black stuff) plus the last 2MB. The fact that the object handles lie at the end of the framebuffer is quite interesting (hash table?), I'll have a look at that tomorrow. However I doubt channel 5 is 3D, but let's hope for it :-)

unsolo: Blitting YUYV to RGB with scaling is clearly feasible, as I have reported in my first post. Actually, I'm not a 3D guy at all and I'm more interested in getting a decent Xorg driver for the PS3. This means accelerated Xv support and Composite support too (as we now have all the tools to do that I think). Basically, changing the strech blit format from 3 to 5 does the job. Converting YUV420 to YUYV is relatively straightforward and should not take too much CPU power, leaving the SPUs for the applications. However, until we find out how to create new GPU objects, using the SPUs for 3D might be a good option. I did actually start a FB+SPU based Xorg driver, thinking there was no hope for direct GPU access, but the experiments of the last days show otherwise and are quite encouraging. Anyway, a SPU can clearly handle YUV->RGB conversion at full resolution alone, so this might be an alternative for people not interested in messing up with their framebuffer driver.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

unsolo, I am a 3D guy. Glaurung is the 2D guy you need.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

@Glarung please take a look here

http://wiki.ps2dev.org/ps3:spu-medialib

As you can se we not only got the colorspace into a single spe but also a scaler into the same one.

I am more than happy to extend these to whatever is needed to make some XV driver and using the stuff we have in the mplayer-vo that shouldnt be to hard.

But my knowledge of xv/Xorg is what comes short. If you could/want to assist on this it would be very much appriciated.
Don't do it alone.
Post Reply