Depth buffer in VRAM + magic at VRAM+2Mbytes

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Depth buffer in VRAM + magic at VRAM+2Mbytes

Post by jsgf »

I've been following up on Holger's work on the depth buffer organization in VRAM, and groepaz's observation that the memory at VRAM+2Mbytes seems to be another view of VRAM with some kind of rearrangement applied.

These two definitely seem related. The raw depth buffer in the normal VRAM space is rearranged in a swizzled-like way. This is the raw dump of the depth buffer converted to an 8bpp greyscale:
Image

When viewed through the VRAM+2M window, it looks a bit more organized:
Image

This is clearly a fairly simple structure, with a simple column-wise rearrangement of each 16 pixel (32 byte) strip. When rearranged, it looks as expected:
Image

For reference, the corresponding colour buffer is:
Image

The rearrangement function is

Code: Select all

unsigned swoz(unsigned x)
{
	return &#40;&#40;x & 0x100&#41; >> 4&#41; | &#40;&#40;x & 0xf0&#41; << 1&#41; | &#40;x & 0xf&#41;;
&#125;
I don't know whether any of the register settings for framebuffer or depth buffer width have any effect on the transformation.

This means it should be pretty efficient to extract the depth buffer in an organized way without lots of CPU effort to rearrange the in-memory organization; the VRAM+2M window does most of the work, and simply copying 32-byte strips with the normal GE copy commands will finish the job.

Now the tricky part is working out how to use this to do depth-map shadows and other effects...
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

When playing around with PSPinside I do remember seeing stuff at vram + 4MB and vram + 6MB aswell (at vram + 8MB the machine locked up :D). They did look similar, but there might be possibilities that they order in a different way.

Doing a copy of those strips to a "normal" texture would be a striped blit which the GE could execute in no time at all, but I think the big issue could be if the GE can access the VRAM above 2MB, or if this just a hardware-mapping that the CPU sees. Let's hope not. :)

Also, if reading from the buffer works, writing to it should be no problems either, giving us a window for interesting zbuffer effects using the color channels.
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

Yeah, checking the other ranges was the obvious next thing to look at.

One problem with doing depth effects is that there's no really good pixel format to match the depth format. You could use the depth buffer as a 16-bit indexed texture, but I'm not sure what format you could use to render into the depth buffer directly. ie, I don't see how to implement depth-format textures.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

There's always the 16bit clut-format, if we figure out how that works properly (since noone has done that yet it seems), we could perhaps map that against the depthbuffer. Maybe I should give it a go to see what this format can bring.
GE Dominator
ooPo
Site Admin
Posts: 2023
Joined: Sat Jan 17, 2004 9:56 am
Location: Canada
Contact:

Post by ooPo »

It looks a lot like on the PS2 when you try to view vram data at different colour depths. Maybe the vram is mirrored in different ways to optimize writes to it depending on if you're using 16/24/32bpp? A kind of free swizzle?

I'd be interested to see if the GE can use this area and what happens when you try to render to or display from it.
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

OK, I tested the other offsets (where +1 = +1*2Mbyte):

VRAM+0: VRAM
VRAM+1: VRAM w/ "swizzle"
VRAM+2: == VRAM+0 (bit for bit identical)
VRAM+3: VRAM w/ "swizzle" + 32-byte column interleave

In other words, reading from VRAM+3 will give you a proper linearized version of the depth buffer with no effort.

The GE sees the same view; a GE copy operation returns the same data (represented as RGB 565):
Image

I have not managed to get a useful result out of using a 16-bit indexed format, but I haven't tested that code before, so there could be something else wrong.

It would also be interesting to see what the actual "swizzle" function is; it seems to be related to texture swizzling in that it has a 16x8 structure. I know Holger worked it out, but it was mixed in with the column interleave stuff. It's possible the VRAM+1 view is performing swizzling for 2048 byte wide image, which would be useful for render-to-texture.

Edit: Hm, I wonder if these views on VRAM have different behaviour for writing? Its possible writing VRAM+2 will get interesting results, even if reading it returns VRAM unmodified.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Ahh, nice work. :) Oh, I can see uses for this no problem. Why limit yourself to 16 bit? With a bit of thought I realised you can do the old PS2 classic with this: Zbuffer fog.

Setup the zbuffer as a 8-bit texture, and with appropriate palette which has a alpha and color set to what you want the fog to fade into. Then it's just a simple method of rendering stripes (64 pixel wide if the cache fills 128 pixels at once to not go over the limit) which skips every other texel from the input zbuffer. You'd have to do this in two separate passes since the texture would be too wide in one pass.. Since you can setup zbuffer testing at the same time, you can easily limit the area from which you are reading, giving you a nice result. I think I'll have to experiment a bit with this over the holidays. :)
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

More experiments:

Writing through these offsets seem to have no effect; setting the drawbuffer pointer to one of these VRAM aliases just works as normal.

But if you pass VRAM+1 or +3 to sceDisplaySetFrameBuf(), the image appears scrambled on screen; +2 appears as normal (like reading from it normally).

So it seems these offsets only have effects for reads, but work for all readers, even framebuffer scanout...
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Too bad. :/

I did try my zbuffer fog, and I have committed a sample (samples/gu/zbufferfog) that implements this technique, which proves that it works just great! It needs a few touches to fit the shade model, but it's a lot easier to deal with than doing the zbuffer approach on the PS2 (which required copying color channels around).
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

Cool! How does this compare to using the existing hardware fog? Is it that you can control the density function in arbitrary ways by adjusting the fog cmap?
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

If the fog on PSP is anything like the PS2, fillrate is cut in half when you enable it, which would make zbuffer fogging a very viable alternative. Also, on PS2, the fog was linear, not perspective-correct, which the zbuffer is. This is just speculated though, I'll do some testing to see if this is still true for the GE. Also, being able to fade the fog into different colors depending on density could be quite nice, since the fog-table can be filled with any values. The current shape of the fog-table is a bit incorrect aswell, since of the non-linear shape of the zbuffer (more values are spent closer to the near plane), I'll fix a more precise formula tomorrow.

This zbuffer texturing opens up for other things aswell, depth of field shouldn't be to hard to implement using this, I'll look at it in a few days.
GE Dominator
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

After a few initial tests, hardware fog doesn't seem to be much of an issue actually, possibly because memory bandwidth is more limited (it might be hidden by the latency). Using zbuffer fogging could still be useful though, since you can do more nonstandard fogging with it.
GE Dominator
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

Hm, I don't see why normal hardware fog would have much impact on fillrate; it shouldn't cost any memory bandwidth. It's something you can compute for each vertex and then interpolate across the triangle.

Surely doing two-pass fog (at least) halves your fillrate by default?

Of course, you can do a lot of cool effects with Z-buffer fog, so it may be worthwhile anyway...
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

I just figured since the PS2-fog costs about half the actual fillrate and the PSP shares some many characteristics (single-texturing for example) it could have been transferred to this platform aswell, but this does not seem to be the case from these simple tests. A proper fog-test should probably be done though before we can draw any real conclusions.
GE Dominator
Visigotico
Posts: 11
Joined: Wed Apr 23, 2008 11:44 am

Post by Visigotico »

with GU_PSM_4444 (16 bit draw and display buffers), linear zbuffer seems to be at zbuffer + 2MB.
Post Reply