Higher Level GPU Questions

ps2devman · Post by **ps2devman** » Wed Nov 21, 2007 6:41 pm

If you can post basic shaders assembler source and header, along with binary result, I'm sure many will be happy to help coding the assembler. Let's start with basic shaders and see what we obtain. Just create a new thread named "FP/VP assembler project", and let's start.

cnlohr · Post by **cnlohr** » Thu Nov 22, 2007 3:39 am

Done -- sorry I couldn't get much of anything done before the realization that it would take me quite some time.

I guess once you guys lay the foundation, I wouldn't mind helping out.

IronPeter · Post by **IronPeter** » Thu Nov 22, 2007 5:43 am

Do it slowly, dig nouveau project deeply. Do it for fun :). We have a lot of time. I think, it will take at least month for the first shader compiler iteration.

IronPeter · Post by **IronPeter** » Thu Nov 22, 2007 10:28 pm

Does somebody have textured MIT licensed model of penguin :)? I want to integrate basic COLLADA parser.

If somebody has this model, please send it to me ( mail in the info ).

Format - Maya .ma or .mb or COLLADA .dae, 1-100K polygons, diffuse texture, with or without skeletal animation.

JPedro · Post by **JPedro** » Fri Nov 23, 2007 8:25 am

Not sure about the license, but I found this with a bit of googling http://www.turnpike420.net/linux/Lpenguin.zip

cnlohr · Post by **cnlohr** » Sat Nov 24, 2007 9:41 am

Hey, this is heading into an area that I can actually help. I've written a large variety of loaders and such for the Mercury Game Engine. For example, the thing I did for the video was .ppm loading, I can throw in .png (With library), .tga, .bmp, .ppm (without library) fairly easily.

Additionally, I have Wavefront OBJ loading code written (bar the multiple texture stuff) and since all we have is texture loading, that's all that makes sense to add.

Additionally, at this point, I would appreciate write support to the repository so I can do one thing to begin with, to make the simple_triangle code use the info stuff to figure out what size screen it needs to use. I don't have access to any sort of 1280x1024 monitor, and it would be nice to make it just work in any screen's code.

On an unrelated note, I found that if I run the init code just once, the video doesn't work, and when I run it twice it does.

Also, in commiting the init binary to the source under SVN, in a bash environment, it can trip the shell up. So, when I run the mknods sh file in the repo, it says init is up to date, but it won't run init. The only way around it for me is to remove the file, then it makes it and it can execute it.

Those are two of the first things I'd really like to add to the repo. I do feel like I would be able to help around the edges with the code.

IronPeter · Post by **IronPeter** » Sat Nov 24, 2007 4:57 pm

I asked admin to grant you with SVN write access.

There is tools folder. Now this folder contains tooldxt compression utility ( it works with .tga file ). You can extend this tool with custom formats. OpenIL is better idea for broad image format support. You can place any libraries into tool and vendors folders, the only restriction is not to link with core RSX code. Loading png in runtime is very bad idea.

Of course, bugfix and refactoring is good :). If you want to support different screen resolutions - make function set_render_target ( for codestyle refer function set_texture_2D ), place this function into file framebuffer.c, call this function. If you want to add a bit more functionality ( render to textures ) - add texture and render target formats A8R8G8B8 and R5G6B5. You also can make swizzled render targets support. If you want to make autogen mip - add sequence of 2D blits into mips...

OBJ loading... Only with binary layer. Just export annotated vertex and index buffers. In binary format. It is better for me to make good high-level ib and vb interfaces at first.

Geometry processing is not very good thing to start without *strong* SPU skills. Geometry must be proccessed by SPU, not RSX in runtime.

It seems like we need bugzilla :).

cnlohr · Post by **cnlohr** » Sat Nov 24, 2007 5:27 pm

1. Are we going to make use of the perspective and/or modelview matricies? If so then there is little need to do vertex processing on the SPU and can leave it straight up to the PPU.

Granted it's exactly the sort of thing the SPUs are good at, I'd put money on the RSX being better.

2. Can I have permission to modify enter_direct, making the struct fb_fix_screeninfo fix; global, and add a function to access its .xres and .yres. As far as I can tell, the way you're using the video card, xoff and yoff can be discarded.

Additionally, I would like to use this functionality in both examples to figure out the width/height, pitch, zeta offset and the clip values.

3. I assume you are refactoring the code in the examples. If not, I would like to commit an example with something in motion.

4. I will begin work, on my own time/space on a PS3-specific C++ engine. If I get anywhere with that, it may be useful for what my interest is in, rapid prototyping.

5. One random concern I found was that even if I wait for the vsync, when I flip buffers, there is some lag of some kind somewhere in the pipeline. I will in fact see tearing in the picture when I do not expect it. If I thrown in a tiny wait, it goes away. Is there any way of verifying that the video card has completed execution of everything in the FIFO buffer, or does that happen anyway upon the while (ctrl[0x10] != ctrl[0x11]); instruction?

6. One last thought. I reside in the US. As far as I can remember, there are some restrictions on reverse engineering because of our DMCA. I believe as long as I only use information provided by others, development with that information is still legal. Is this a concern? Does this just fall under reverse-engineering strictly for compatibility purposes (so there's nothing to worry about)?

jonwil · Post by **jonwil** » Sat Nov 24, 2007 6:40 pm

My understanding is that the DMCA only applies to copy protected works (and to software for playing those works). As far as I know, there is no copy protection in the NVIDIA shader architecture (there might be support for Macrovision or HDCP but those are totally unrelated to what is being done to support PS3 RSX in OtherOS)

IANAL though so I could be wrong.

IronPeter · Post by **IronPeter** » Sat Nov 24, 2007 6:59 pm

1. ) Modelview matrix must be handled by SPU if we want back face cull on SPU. Perspective matrix also can be handled. Of course, we must support vertex shaders. You can precompile mat4x4 transform shader and use it with constants.

Keep in mind that one singe SPU = RSX in vertex processing. You must care only about memory bandwidth. So use vertex shadrs as "data decomress stage". Very bad idea to use shaders for massive stuff like skeletal animation.

2. ) Ok. Global vars are evil - so make this struct param or result for function.

3. ) Yes, I'am refactoring. But if you are inserting the new functionality - try to do some refactoring with it. set_render_target is just an example of possible refactoring.

4.) Ok. The only thing you need - to discuss core functionality with me before commiting ( via mail probably ). Outside RSX core you can do everythig you want.

5.) Ok. Use notifiers from RSX ( just write someting at NvSub3D tag 0x110 and watch notifier area ).

There is a more powerfull thing. You may ask RSX to write value in the memory location you want ( dig the page 2 of "hunt topic" ). For example, you can insert semaphore in the push buffer. Semaphore is just "jump to itself" in the push buffer. Rewrite this jump with zero by CPU when you want to release semaphore.

6.) Do not worry about. Nouveau project exists many years without any problems.

Glaurung · Post by **Glaurung** » Sun Nov 25, 2007 6:47 am

Hi,

2. ) Indeed xoff and yoff are not important. The actual size of the display cannot be obtained from standard framebuffer API, only the size of the region copied to VRAM by the blitter (which may be smaller in non-fullscreen mode). What I do in the Xorg driver is I ask the ps3fb its current video mode (using the PS3 specific PS3FB_IOCTL_GETMODE) and have a table of the corresponding fullscreen resolution. Then you can deduce the offset of the first pixel in VRAM for non-fullscreen mode by subtracting var.xres_virtual, var.yres_virtual (obtained from standard FBIOGET_VSCREENINFO) from the actual fullscreen width and height.

5. ) For an example on how to use notifiers, see my most recent post in the other thread. Basically you add two FIFO commands, one to ask for notification, and one to wait for end of previous operation in the subchannel. Upon completion, the notifier will write a 128-bit value at location reports + 0x1000 (that's the one you use for the 3D object). The first two dwords are timestamp, then error code and status (should be zero). I'm not sure you already map the reports region in libps3rsx, if not, it will be useful to extend the ioctl and mmap() API of the patch for this (I'll provide that in the soon-to-come ps3gpu kernel module).

BTW, it reminds me of some other things I noticed on libps3rsx. The 0xfeed0003 is actually used by the HV in some cases (when changing flags of the context_allocate calls, it creates more objects), so it is unwise to use it for 3D. Besides, I think the hash is not correct and corresponds to 0xfeed0007. It's weird it works with both values in fact.. Anyway, this may already be fixed, I did not check recently. I'll be interested to contribute to libps3rsx in the future, but for now I'll focus on independent kernel module and accelerated VRAM mtd.

And congratulations to IronPeter for the DXT texture loader!

cnlohr · Post by **cnlohr** » Sun Nov 25, 2007 11:57 am

1) I am mostly concerned about needless traffic over bus. If we do it that way, we would have to transfer every single triangle drawn every frame over the bus. I wanted to test the performance difference when drawing 1,000 instances of a 100 or so triangle object per frame but read (7) for the problem with this. Rationale behind this would be, with the vertex shader doing the gl_ModelviewMatrix transform, you don't have to send the vertex data over but once, otherwise you are really bogging the bus down.

At any rate, I guess I shouldn't make too many assumptions until I actually clock it.

Besides, if we do it on a per-vertex basis, it doesn't make much sense to use the vertex buffers, we might as well just be doing everything in immediate mode over the fifo.

*EDIT* 2) Well, we're gonna have global vars, just depends on where they live. Either enter_direct() can take on a fb_fix_screeninfo pointer, and the width/height remains global in the simple_*.c file. OR fb_fix_screeninfo is a file global variable in utils.c, that you use an accessor in utils.h to get the width/height info from.

Which one do you prefer? Hopefully over time, I'll get a taste for what you expect in code.

5) I'll try to figure this out. Is there anywhere in either project that makes clear use of this?

*NEW* 7) Whenever I draw a scene with the vertex buffer code, let the program finish, modify the code so the triangles are in different places, and run the code again, it uses the original buffer. No matter what I change, the buffer that I first ran the program with will display -- unless -- I run a different program entirely, then for that program it will use the first one that program started with. If I go back to the first program, it will use the buffer that it starts with this time and continue to use it for each subsequent run until I switch programs.

What do we have to do to update, or remove the existing vertex buffer?

Additionally, I can't seem to make my buffers too big, much more than 32 triangles or so, and no more show up. Is there some constant in your simple_triangle code I'm missing? It looks to me like you've allocated a megabyte to it.

IronPeter · Post by **IronPeter** » Sun Nov 25, 2007 5:34 pm

First - using fifo for embedded geometry is not very good idea.

NVidia hardware has many caches. There are pre-transform and post-transform caches. If your old data for vertex buffer is shown - you run in ( pretransform ) cache. Nouveau headers contain defines for caches control. If you are using large circular dynamic data buffer - you need not cache control.

These caches are small. On real 3d model this cache will be flushed between two render passes. Any vertex shader setup and vertex shader constant setup will kill perfomance gain from caching.

SPU geometry processing is not my idea. It is way how official tools from Sony are working. http://forum.beyond3d.com/showthread.ph ... 85&page=10

Alot of the stuff in Edge (backface culling, skinning, zero area triangle culling, zero pixel triangle culling, ...

IronPeter · Post by **IronPeter** » Sun Nov 25, 2007 5:42 pm

>Additionally, I can't seem to make my buffers too big, much more than 32 triangles or so, and no more show up

hm..I have no idea. Do you run out of max vertex batch size? But it is 256 primitives.

PS. I tried 60 triangles in vertex buffer - it works. You run out of max vertex batch or are using prefetched-precached empty data.

cnlohr · Post by **cnlohr** » Mon Nov 26, 2007 3:55 pm

I minimized the changes I made, but I did make it so if you're in any of the full screen resolutions, it appears to automatically pick it make it work at it. Well, I don't have a HDTV now to test with, but it did detect my 480i TV and set the resolution accordingly.

I don't want to commit too much until you look over my style, log, and modifications to make sure I don't screw anything else up.

Thanks the the help getting to learn this stuff.

IronPeter · Post by **IronPeter** » Mon Nov 26, 2007 5:11 pm

Ok, I'll test at home ( after ~12 hours ). Commit, I'll review your code via e-mail. Or you can send .diff for codereview.

Rich43 · Post by **Rich43** » Fri Nov 30, 2007 11:56 am

I dont have a PS3 but I plan to get one!

Does glxgears work with the driver?

cnlohr · Post by **cnlohr** » Sat Dec 01, 2007 5:48 am

Not even close. This has nothing to do with OpenGL support on the PS3, it's straight up you talking to the GPU without any pre-existing standardized library in the way. But even that's not done.

forums.ps2dev.org

Higher Level GPU Questions

Small request.

Repo access?