The hunt for HV's FIFO/Push buffer...
Hi,
I've finally been able to setup and use a second independent context. I was able to perform the 'upper VRAM workaround' from this second context, even though the first context (setup by ps3fb) has restricted upper VRAM access through DMA (by means of the lv1_gpu_memory_allocate(ps3fb_videomemory.size,...) call).
The contexts are truely independent including:
- object bindings: since lv1_gpu_context_attribute:fb_setup fails with LV1_BUSY, we have to bind objects by hand in the newly created context. For this we can use the exact same commands FB_SETUP puts in the FIFO (http://www.everfall.com/paste/id.php?ew29498z816w) when creating the first context.
- iomapping: the lv1_gpu_context_iomap call has to be done again to allow the GPU to access XDR. The location of the mapping in GPU space (GPU_IOIF) can be the same or different from the value used by ps3fb (0x0d000000)
- FIFO control and location: the FIFO control registers initially read as zero. They can be written to with the address of the second context FIFO. In my test I used the 64kB just before the ps3fb FIFO (i.e. 128kb from the end of the XDR ps3fb_videomemory region). So I put 0x0e1e0000 in the registers (0x10000 less than the value I read in ps3fb context), yet we still have to figure out how this value is obtained from the address of the ps3fb_videomemory, so that we can locate the FIFO anywhere we want.
This means interesting things:
- We don't need the FIFO workaround anymore! But the 'upper VRAM' one is still needed and can be executed from second context.
- We should finally be able to provide one (or several) independent kernel module for all our GPU work (3D,Xorg,VRAM mtd). I'll look into this tomorrow and try to provide this module.
- We should be able to have both 3D and accelerated Xorg working at the same time.
In the meantime, this is the code snippet that does RAMHT -> lower VRAM copy from second context, for those interested:
It outputs the following:
The notify shows the DMA was sucessfull (first 64 bits are a timestamp), and the value 00000000 00501000 is the hash entry for the NV Null object (first entry of RAMHT).
I've finally been able to setup and use a second independent context. I was able to perform the 'upper VRAM workaround' from this second context, even though the first context (setup by ps3fb) has restricted upper VRAM access through DMA (by means of the lv1_gpu_memory_allocate(ps3fb_videomemory.size,...) call).
The contexts are truely independent including:
- object bindings: since lv1_gpu_context_attribute:fb_setup fails with LV1_BUSY, we have to bind objects by hand in the newly created context. For this we can use the exact same commands FB_SETUP puts in the FIFO (http://www.everfall.com/paste/id.php?ew29498z816w) when creating the first context.
- iomapping: the lv1_gpu_context_iomap call has to be done again to allow the GPU to access XDR. The location of the mapping in GPU space (GPU_IOIF) can be the same or different from the value used by ps3fb (0x0d000000)
- FIFO control and location: the FIFO control registers initially read as zero. They can be written to with the address of the second context FIFO. In my test I used the 64kB just before the ps3fb FIFO (i.e. 128kb from the end of the XDR ps3fb_videomemory region). So I put 0x0e1e0000 in the registers (0x10000 less than the value I read in ps3fb context), yet we still have to figure out how this value is obtained from the address of the ps3fb_videomemory, so that we can locate the FIFO anywhere we want.
This means interesting things:
- We don't need the FIFO workaround anymore! But the 'upper VRAM' one is still needed and can be executed from second context.
- We should finally be able to provide one (or several) independent kernel module for all our GPU work (3D,Xorg,VRAM mtd). I'll look into this tomorrow and try to provide this module.
- We should be able to have both 3D and accelerated Xorg working at the same time.
In the meantime, this is the code snippet that does RAMHT -> lower VRAM copy from second context, for those interested:
Code: Select all
#define OP(subch, tag, size) (((size) << 18) | ((subch) << 13) | (tag))
u32 fifo_program[] = {
// init
OP(1, 0x000, 1), // bind to subchannel 1
0x31337303, // Memory to Memory instance
OP(1, 0x180, 3), //
0x66604200, // DMA notifier to reports + 0x1000
0xfeed0001, // DMA source from DMA system RAM instance
0xfeed0000, // DMA dest to DMA video RAM instance
OP(2, 0x000, 1), // bind to subchannel 2
0x3137c0de, // Memory to Memory instance
OP(2, 0x180, 3), //
0x66604200, // DMA notifier to reports + 0x1000
0xfeed0000, // DMA source from DMA video RAM instance
0xfeed0001, // DMA dest to DMA system RAM instance
OP(3, 0x000, 1), // bind to subchannel 3
0x313371c3, // 2D ContextSurface instance
OP(3, 0x180, 3), //
0x66604200, // DMA notifier to reports + 0x1000
0xfeed0000, // DMA source from DMA video RAM instance
0xfeed0000, // DMA dest to DMA video RAM instance
OP(4, 0x000, 1), // bind to subchannel 4
0x31337a73, // Swizzled Surface instance
OP(4, 0x180, 2), //
0x66604200, // DMA notifier to reports + 0x1000
0xfeed0000, // DMA source from DMA video RAM instance
OP(5, 0x000, 1), // bind to subchannel 5
0x31337808, // Image from CPU instance
OP(5, 0x180, 8), //
0x66604200, // DMA notifier to reports + 0x1000
0x00000000, // colorkey
0x00000000, // clip rectangle
0x00000000, // pattern
0x00000000, // ROP
0x00000000, // beta1
0x00000000, // beta4
0x313371c3, // surface
OP(5, 0x2fc, 2), //
0x00000003, // operation srccopy
0x00000004, // color format A8R8G8B8
OP(6, 0x000, 1), // bind to subchannel 6
0x3137af00, // Scaled Image instance
OP(6, 0x180, 1), //
0x66604200, // DMA notifier to reports + 0x1000
// blit DDR->DDR
OP(6, 0x184, 1),
0xfeed0000, // DMA image from video memory
OP(6, 0x198, 1),
0x313371c3, // surface
OP(3, 0x300, 1),
0x0000000a, // surface format A8R8G8B8
OP(3, 0x30c, 1),
8*1024*1024, // surface offset video RAM plus 8MB
OP(3, 0x304, 1),
0x10001000, // surface pitch 4096
OP(6, 0x2fc, 9),
0x00000001, // color conversion truncate
0x00000003, // color format A8R8G8B8
0x00000003, // operation srccopy
0x00000000, // clip point (0,0)
0x02000400, // clip size (1024,512)
0x00000000, // out point (0,0)
0x02000400, // out size (1024,512)
0x00100000, // du/dx 1.0
0x00100000, // dv/dy 1.0
OP(6, 0x400, 4),
0x02000400, // size (1024x512)
0x00021000, // pitch 4096, origin corner, no filtering
254*1024*1024, // address 2MB to end of vram
0x00000000, // point (0,0)
OP(6, 0x104, 1), // notify
0,
OP(6, 0x100, 1), // wait
0,
};
[...]
u32 *notify = ps3gpu.reports;
u32 *fb = ps3gpu.vram;
ps3gpu.fifo = ps3gpu.xdr + ps3fb_videomemory.size - 2*GPU_CMD_BUF_SIZE;
memset(ps3gpu.fifo, 0, GPU_CMD_BUF_SIZE);
memcpy(ps3gpu.fifo, fifo_program, sizeof(fifo_program));
notify[0x1000 / 4 + 0] = 0xffffffff;
notify[0x1000 / 4 + 1] = 0xffffffff;
notify[0x1000 / 4 + 2] = 0xffffffff;
notify[0x1000 / 4 + 3] = 0xffffffff;
fb[8*1024*1024/4 + 0x190000 / 4 + 0] = 0xdeadbeef;
fb[8*1024*1024/4 + 0x190000 / 4 + 1] = 0xdeadbeef;
printk("fifo regs = %p\n", ps3gpu.fifo_regs);
printk("fifo regs[0x10] = %08x\n", ps3gpu.fifo_regs[0x10]);
printk("fifo regs[0x11] = %08x\n", ps3gpu.fifo_regs[0x11]);
printk("fifo regs[0x15] = %08x\n", ps3gpu.fifo_regs[0x15]);
msleep(100);
ps3gpu.fifo_regs[0x11] = 0x0e1e0000;
ps3gpu.fifo_regs[0x15] = 0x0e1e0000;
ps3gpu.fifo_regs[0x10] = 0x0e1e0000;
printk("fifo regs[0x10] = %08x\n", ps3gpu.fifo_regs[0x10]);
printk("fifo regs[0x11] = %08x\n", ps3gpu.fifo_regs[0x11]);
printk("fifo regs[0x15] = %08x\n", ps3gpu.fifo_regs[0x15]);
msleep(100);
ps3gpu.fifo_regs[0x10] = 0x0e1e0000 + sizeof(fifo_program);
printk("fifo regs[0x10] = %08x\n", ps3gpu.fifo_regs[0x10]);
printk("fifo regs[0x11] = %08x\n", ps3gpu.fifo_regs[0x11]);
printk("fifo regs[0x15] = %08x\n", ps3gpu.fifo_regs[0x15]);
msleep(100);
printk("fifo regs[0x10] = %08x\n", ps3gpu.fifo_regs[0x10]);
printk("fifo regs[0x11] = %08x\n", ps3gpu.fifo_regs[0x11]);
printk("fifo regs[0x15] = %08x\n", ps3gpu.fifo_regs[0x15]);
printk("notify = %08x/%08x/%08x/%08x\n",
notify[0x1000 / 4 + 0],
notify[0x1000 / 4 + 1],
notify[0x1000 / 4 + 2],
notify[0x1000 / 4 + 3]);
printk("%08x %08x\n",
fb[8*1024*1024/4 + 0x190000 / 4 + 0],
fb[8*1024*1024/4 + 0x190000 / 4 + 1]);
Code: Select all
fifo regs = d0000800907c0000
fifo regs[0x10] = 00000000
fifo regs[0x11] = 00000000
fifo regs[0x15] = 00000000
fifo regs[0x10] = 0e1e0000
fifo regs[0x11] = 0e1e0000
fifo regs[0x15] = 0e1e0000
fifo regs[0x10] = 0e1e0118
fifo regs[0x11] = 0e1e0000
fifo regs[0x15] = 0e1e0000
fifo regs[0x10] = 0e1e0118
fifo regs[0x11] = 0e1e0118
fifo regs[0x15] = 0e1e0118
notify = 0000002c/56a025a0/00000000/00000000
00000000 00501000
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
I have some problems using Glaurung's ps3fb.* patch and getting the kernel to compile:
(note: this happens with the latest kernel sources pulled from Geoff Levand's PS3 kernel tree)
Then... this happens:
(note: for the weird characters in the output, I am working on my ps3 remotely using putty from Windows)
The ps3fb.c file recently changed quite a bit (a big section used to make the black-border fix hack work was taken away for example).
(note: this happens with the latest kernel sources pulled from Geoff Levand's PS3 kernel tree)
Code: Select all
[root@IGGS-PS3 ps3-linux]# patch -p1 < ps3fb.diff
patching file drivers/video/ps3fb.c
Hunk #1 FAILED at 51.
Hunk #2 succeeded at 113 (offset -3 lines).
Hunk #4 FAILED at 328.
Hunk #5 succeeded at 823 (offset 41 lines).
Hunk #7 succeeded at 1155 (offset 41 lines).
Hunk #8 FAILED at 1262.
Hunk #9 FAILED at 1292.
Hunk #11 succeeded at 1428 (offset 40 lines).
Hunk #13 succeeded at 1511 (offset 40 lines).
Hunk #15 succeeded at 1561 (offset 40 lines).
4 out of 15 hunks FAILED -- saving rejects to file drivers/video/ps3fb.c.rej
patching file include/asm-powerpc/ps3fb.h
[root@IGGS-PS3 ps3-linux]# make clean
[root@IGGS-PS3 ps3-linux]# make mrproper
[root@IGGS-PS3 ps3-linux]# make ps3_defconfig
[root@IGGS-PS3 ps3-linux]# make menuconfig
[root@IGGS-PS3 ps3-linux]# make
[...]
Code: Select all
CC drivers/video/fb_sys_fops.o
CC drivers/video/fb_defio.o
CC drivers/video/ps3fb.o
drivers/video/ps3fb.c: In function âps3fb_probeâ:
drivers/video/ps3fb.c:1387: error: âps3fb_gpu_majorâ undeclared (first use in this function)
drivers/video/ps3fb.c:1387: error: (Each undeclared identifier is reported only once
drivers/video/ps3fb.c:1387: error: for each function it appears in.)
drivers/video/ps3fb.c:1410: error: âiâ undeclared (first use in this function)
drivers/video/ps3fb.c:1457: warning: label âerr_release_vramâ defined but not used
drivers/video/ps3fb.c:1453: warning: label âerr_release_dinfoâ defined but not used
drivers/video/ps3fb.c:1451: warning: label âerr_release_ctrlâ defined but not used
drivers/video/ps3fb.c:1449: warning: label âerr_release_reportsâ defined but not used
drivers/video/ps3fb.c:1445: warning: label âerr_iounmap_ctrlâ defined but not used
drivers/video/ps3fb.c: In function âps3fb_shutdownâ:
drivers/video/ps3fb.c:1487: error: âps3fb_gpu_majorâ undeclared (first use in this function)
make[2]: *** [drivers/video/ps3fb.o] Error 1
make[1]: *** [drivers/video] Error 2
make: *** [drivers] Error 2
The ps3fb.c file recently changed quite a bit (a big section used to make the black-border fix hack work was taken away for example).
Last edited by Panajev2001a on Sun Nov 25, 2007 7:08 am, edited 1 time in total.
Tracking geoff's tree will always be problematic. Now that I know it is possible, I'll provide a separate kernel module for the GPU stuff. Another solution would be to have our own ps2dev git tree for the kernel. If we are going to make an easy to install CD for 3D development, as suggested, we will need our own kernel anyway.
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
With your separate module for GPU stuff (3D, Xorg, etc...) will we be able to use the kernel sources from Geoff's tree and simply get your kernel modules loaded in place of the fb device created by ps3fb.c and ps3fb.h ?Glaurung wrote:Tracking geoff's tree will always be problematic. Now that I know it is possible, I'll provide a separate kernel module for the GPU stuff. Another solution would be to have our own ps2dev git tree for the kernel. If we are going to make an easy to install CD for 3D development, as suggested, we will need our own kernel anyway.
Keeping in sync with Geoff's tree is something that would be beneficial for those also doing CELL development as it is the tree that gets a lot of work from IBM and SCE guys that is CELL related.
I guess we will see :D.
Hi,
Separate kernel module is not ready yet, it seems a call to fb_blit (from any context, and even of 0x0 size) is still needed to kick off the fifo. So as soon as ps3fb blit is disabled with PS3FB_IOCTL_ON, it does not work. I'm confident I can work something out in a few days though.
Concerning the patch, there are actually several versions of it. I tried to track geoff's tree, and the most up to date version is here:
http://mandos.homelinux.org/~glaurung/ps3/
It's only a few days old but I think it's already outdated.. That's why I'm saying tracking geoff's tree is not tractable. I'm not thinking of doing a completely different tree, but rather to have geoff's tree as our origin. We would be able to control when we update, and add additional functionnality (like gpu stuff, game controller support, fb black border patch, etc..). This would ensure we have a working tree. So, to answer meir420 question, the patch applies to whatever geoff's tree version was available the day it was made. However, you can clone my own public tree and give it a try, it is based on a recent copy of geoff's tree and has the patch included:
git clone http://mandos.homelinux.org/~glaurung/g ... x-ps3.git/
As for broken sound, that's an old version of the patch when I was trying to figure out how lv1_gpu_device_map() works. This problem is solved in more recent versions of the patch (I noticed this trying to play a movie..).
Of course all this is only if you're really impatient, as the separate kernel module should solve all these issues :)
Separate kernel module is not ready yet, it seems a call to fb_blit (from any context, and even of 0x0 size) is still needed to kick off the fifo. So as soon as ps3fb blit is disabled with PS3FB_IOCTL_ON, it does not work. I'm confident I can work something out in a few days though.
Concerning the patch, there are actually several versions of it. I tried to track geoff's tree, and the most up to date version is here:
http://mandos.homelinux.org/~glaurung/ps3/
It's only a few days old but I think it's already outdated.. That's why I'm saying tracking geoff's tree is not tractable. I'm not thinking of doing a completely different tree, but rather to have geoff's tree as our origin. We would be able to control when we update, and add additional functionnality (like gpu stuff, game controller support, fb black border patch, etc..). This would ensure we have a working tree. So, to answer meir420 question, the patch applies to whatever geoff's tree version was available the day it was made. However, you can clone my own public tree and give it a try, it is based on a recent copy of geoff's tree and has the patch included:
git clone http://mandos.homelinux.org/~glaurung/g ... x-ps3.git/
As for broken sound, that's an old version of the patch when I was trying to figure out how lv1_gpu_device_map() works. This problem is solved in more recent versions of the patch (I noticed this trying to play a movie..).
Of course all this is only if you're really impatient, as the separate kernel module should solve all these issues :)
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
Glaurung, your patch was easily fixable or should I say updatable (well I applied the patch, got an original copy of the two modified files and fixed the patched files and re-generated the patch with git diff although I need to get it better looking than a/a/drivers... etc... :() to make the ps3fb.c present in the latest kernel to compile: I do not know if the resulting kernel will work or if it can provide the desired effect yet (have not booted it yet nor finished compiling it :)), but in a for loop in your patch the index (i around line 1410) was not declared in the function itself and the ps3fb_gpu_major static variable was not declared either (those were probably the two things that failed with the patch).
Unless something drastic changes in related files it should not be difficult to keep track of the kernel a little longer (of course I did something trivial today, so I guess I got lucky :))... until the kernel modules arrive ;).
The kernel compiled and booted:
2.6.24-rc3-g35fb6813 or more accurately 2.6.24-rc3-g35fb6813-dirty ;).
Patch:
ps3fb_patch.diff
a --> it contains unmodified files from Geoff's latest kernel sources.
b --> it contains patched/modified files.
Note: if you apply it like as is you need to use the -p2 option and not -p1 (inside /usr/src/linux patch -p2 < ps3fb_patch.diff ... sorry for the messy still diff newbie mistake you can obviously spot from the top of the patch file... ;)).
... useless patch removed: I'll try to upload it somewhere or post it again once I am sure it can work with a simple cut and paste.
Unless something drastic changes in related files it should not be difficult to keep track of the kernel a little longer (of course I did something trivial today, so I guess I got lucky :))... until the kernel modules arrive ;).
The kernel compiled and booted:
2.6.24-rc3-g35fb6813 or more accurately 2.6.24-rc3-g35fb6813-dirty ;).
Patch:
ps3fb_patch.diff
a --> it contains unmodified files from Geoff's latest kernel sources.
b --> it contains patched/modified files.
Note: if you apply it like as is you need to use the -p2 option and not -p1 (inside /usr/src/linux patch -p2 < ps3fb_patch.diff ... sorry for the messy still diff newbie mistake you can obviously spot from the top of the patch file... ;)).
... useless patch removed: I'll try to upload it somewhere or post it again once I am sure it can work with a simple cut and paste.
Last edited by Panajev2001a on Tue Nov 27, 2007 4:57 am, edited 7 times in total.
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
Code: Select all
vram 264241152 fifo 65536 ctrl 4096
mmap: /dev/ps3gpu_vram len 264241152
mmap: /dev/ps3gpu_fifo len 65536
mmap: /dev/ps3gpu_ctrl len 4096
frag prog 0x2800000
Is the reason why I do see the triangles, but no animation and some left-overs from the triangle test inside the black-borders cause by my running the console in non full screen mode (thus with black borders) ?
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
So:IronPeter wrote:>but no animation
The way its meant to be played :). Movie with animation was taken from modified version. Fullscreen view is feature.
1.) it should not have animated.
2.) it is ok that I still see on screen, around the regular Fedora desktop, the non cleared picture from the triangle demo.
but (here are the doubts ;))
3.) is it ok if the triangle is only momentarily shown for a very brief time, basically just flashing on the screen and going away ?
4.) my using a non full screen video mode is not messing things up, right ?
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
Sorry, my fault...meir420 wrote:Panajev2001a, i tried compiling it and got a version error. is it possible for you to upload just the driver somewhere?
[...]
I'll try bringing both files to their default state (last checkout) and apply the patch exactly as I posted it (copy and paste from the post I made) on this forum and compile again...
It appears I made a mistake with the patch I posted... let me check a few things...
It appears that I destroy the patch posting it and with the copy and paste.
I tried doing a copy and paste from WordPad (opening the copy of the patch file remotely) into a new file opened with vi and that worked fine.
I tried doing a similar process using the code posted in this thread pasting it into a new file opened with vi and the results were messy... i.e. patch would no longer work correctly (formatting and white spaces is quite important).
If you need the patch I can e-mail it to you or I will try to find a temporary host for it. The patch, when applied should completely succeed and not fail a single entry.
The output should be just this:
Code: Select all
[root@IGGS-PS3 linux]# patch -p2 < ps3fb_patch.diff
patching file drivers/video/ps3fb.c
patching file include/asm-powerpc/ps3fb.h
can you email me the resulting driver (deb file)? my email is [email protected]
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
I sent you the e-mail with the patch to be applied to the kernel sources (the ps3fb files), I do not have a driver... :(.meir420 wrote:can you email me the resulting driver (deb file)? my email is [email protected]
Firmware 2.01
Just a quick heads-up for cautious upgraders: I verified that the
triangle and dxt demos still work with FW 2.01.
triangle and dxt demos still work with FW 2.01.
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
separate module status
Thanks for testing mc, it's good to know that this 2.01 was not a RSX/HV fix release :-)
Just a little update on kernel module before sleep: I just managed to get Xorg running with the separate module (there were a lot of hardcoded values that were broken if running in a different context). I quickly tested all the fancy features (Xv,Composite), so I can confirm that 3D is working in a different context too (translucency uses the TCL engine).
The module is not complete and still needs a lot of cleanup (as does the Xorg driver). Then, I need to write a patch for libps3rsx and test it. I need to test with more than one context too. Also, it still uses the FIFO workaround (with strong and ugly assumptions on how the HV allocates contexts), until I figure out how FIFOs are kicked exactly.
@IronPeter: My current tests show that all FIFOs are kicked when fb_blit is called, from any context. The fb_blit commands are actually put in the ps3fb fifo, whatever the context argument is. The FIFO executes until the put pointer reaches the get pointer, and cannot be restarted by simply changing the get pointer again (or maybe while the fb_blit is still executing it is possible, but blit is very fast, so it's hard to tell). I tried your trick of looping the FIFO, and it seems to allow us to continue execution by removing the semaphore (so there is no preemption issue). I believe the explanation for this behaviour is that the GPU is started by fb_blit, and then the 0x110 tag (the one we remove with FIFO workaround) is telling the GPU to go idle when all contexts are done. By keeping at least one context alive (using the semaphore trick), we can prevent the GPU from going idle and stopping any FIFOs. I'm wondering however if it does not consume bandwidth, as the GPU is doing DMA to read FIFO continuously, right? I'm thinking of using an ioctl() to kick FIFO (which is slow anyway) through fb_blit, do you believe it is ok or should we keep using the FIFO workaround?
Just a little update on kernel module before sleep: I just managed to get Xorg running with the separate module (there were a lot of hardcoded values that were broken if running in a different context). I quickly tested all the fancy features (Xv,Composite), so I can confirm that 3D is working in a different context too (translucency uses the TCL engine).
The module is not complete and still needs a lot of cleanup (as does the Xorg driver). Then, I need to write a patch for libps3rsx and test it. I need to test with more than one context too. Also, it still uses the FIFO workaround (with strong and ugly assumptions on how the HV allocates contexts), until I figure out how FIFOs are kicked exactly.
@IronPeter: My current tests show that all FIFOs are kicked when fb_blit is called, from any context. The fb_blit commands are actually put in the ps3fb fifo, whatever the context argument is. The FIFO executes until the put pointer reaches the get pointer, and cannot be restarted by simply changing the get pointer again (or maybe while the fb_blit is still executing it is possible, but blit is very fast, so it's hard to tell). I tried your trick of looping the FIFO, and it seems to allow us to continue execution by removing the semaphore (so there is no preemption issue). I believe the explanation for this behaviour is that the GPU is started by fb_blit, and then the 0x110 tag (the one we remove with FIFO workaround) is telling the GPU to go idle when all contexts are done. By keeping at least one context alive (using the semaphore trick), we can prevent the GPU from going idle and stopping any FIFOs. I'm wondering however if it does not consume bandwidth, as the GPU is doing DMA to read FIFO continuously, right? I'm thinking of using an ioctl() to kick FIFO (which is slow anyway) through fb_blit, do you believe it is ok or should we keep using the FIFO workaround?
-
- Posts: 100
- Joined: Sat Aug 20, 2005 3:25 am
Do you get a compile error ?meir420 wrote:when i try to compile i get an error "The changelog says we are creating 2.6.24-rc3
However, I thought the version is 2.6.24-rc3-dirty-g35fb6813-dirty-g35fb6813-dirty-dirty"
i assume this is because i edited files, how can i fix this?
If you get 2.6.24-rc3-g35fb6813 followed by -dirty, thus 2.6.24-rc3-g35fb6813-dirty then it is perfectly normal. the only way not to get the "-dirty" suffix would be not to patch/modify any of the files. It is just a warning that the kernel was not compiled "as-is" nothing more in itself.
Glaurung, it is really hard to support windowed 3D rendering. Many problems with "swap chain present", problems with multiply windows - resource sharing etc. Even windows Z-order is hard problem.
I want to turn 2D drivers off by ioctl while 3D rendering. At least as a first iteration. Keeping RSX idle in exclusive mode is not a problem. I was unable to note any perfomance hit for SPU/PPU DMA speed while RSX is waiting on semaphore. It is good sync method for the exclusive mode, I think.
If there is a perfomance hit for such a semaphore - we can loop RSX on idle hard work. for example, DDR->DDR DMA copy.
I want to turn 2D drivers off by ioctl while 3D rendering. At least as a first iteration. Keeping RSX idle in exclusive mode is not a problem. I was unable to note any perfomance hit for SPU/PPU DMA speed while RSX is waiting on semaphore. It is good sync method for the exclusive mode, I think.
If there is a perfomance hit for such a semaphore - we can loop RSX on idle hard work. for example, DDR->DDR DMA copy.
NV40TCL_VTX_CACHE_INVALIDATE
I'm trying to do some animation with vertex buffers, but the RSX
is caching my vertices. There is a NV40TCL_VTX_CACHE_INVALIDATE,
but does anyone know how many parameters it takes and what
they are?
is caching my vertices. There is a NV40TCL_VTX_CACHE_INVALIDATE,
but does anyone know how many parameters it takes and what
they are?
Flying at a high speed
Having the courage
Getting over crisis
I rescue the people
Having the courage
Getting over crisis
I rescue the people
module draft
Here is a draft of the separate kernel module:
git clone http://mandos.homelinux.org/~glaurung/git/ps3rsx.git
I've updated the Xorg driver accordingly:
git clone http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git
It is not yet finished, and mostly untested, but it shows my plan for the interface with userspace. Basically we have two options to access the FIFO:
- exclusive: we disable framebuffer redraw, allocate a context, then use the FIFO workaround to start running the context's FIFO and use it until the application is over. No other application can use the GPU, and any blit from either the fb driver or another context would break the application in exclusive mode. That's how we've been using the RSX so far.
- shared: we allocate a context, then run the FIFO for some time using a fake blit (size 0x0). When commands are finished processing, a new fake blit is needed to kick the FIFO again and process more commands. In the meantime, other contexts (including FB driver) can use the GPU. Note that contexts are not preempted, so if we put FIFO in a loop, other contexts are not able to execute commands (and we get error -24 for FB blit). So this is cooperative sharing.
I want to be able to support both modes. The former is easier to use and more efficient (useful for e.g. 3D demo or game), while the latter allows mixing multiple users of the GPU (e.g. Desktop usage, with one context for Xorg, one for GLX, and one for VRAM used as swap). The current module provides those two modes, with one ioctl() to enter exclusive mode, and one ioctl for one-shot kicking of the fifo (shared mode). The Xorg driver still uses exclusive mode ATM; I've code for shared mode but it needs testing and benchmarking. My short term goal is usable Desktop usage (X is improved, now we need more RAM).
On the long term, my plan is to add RAMIN/VRAM memory management to the module, have libps3rsx provide the user space API for FIFO control and object instanciation. Also, Xorg driver should use libps3rsx to avoid code duplication. Maybe when can then move slowly from ps3rsx.ko/librsx to drm/libdrm API (but keep exclusive mode option in parallel). Sound good?
Oh, and probably the module should be hosted in ps2dev svn when it is more mature.
git clone http://mandos.homelinux.org/~glaurung/git/ps3rsx.git
I've updated the Xorg driver accordingly:
git clone http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git
It is not yet finished, and mostly untested, but it shows my plan for the interface with userspace. Basically we have two options to access the FIFO:
- exclusive: we disable framebuffer redraw, allocate a context, then use the FIFO workaround to start running the context's FIFO and use it until the application is over. No other application can use the GPU, and any blit from either the fb driver or another context would break the application in exclusive mode. That's how we've been using the RSX so far.
- shared: we allocate a context, then run the FIFO for some time using a fake blit (size 0x0). When commands are finished processing, a new fake blit is needed to kick the FIFO again and process more commands. In the meantime, other contexts (including FB driver) can use the GPU. Note that contexts are not preempted, so if we put FIFO in a loop, other contexts are not able to execute commands (and we get error -24 for FB blit). So this is cooperative sharing.
I want to be able to support both modes. The former is easier to use and more efficient (useful for e.g. 3D demo or game), while the latter allows mixing multiple users of the GPU (e.g. Desktop usage, with one context for Xorg, one for GLX, and one for VRAM used as swap). The current module provides those two modes, with one ioctl() to enter exclusive mode, and one ioctl for one-shot kicking of the fifo (shared mode). The Xorg driver still uses exclusive mode ATM; I've code for shared mode but it needs testing and benchmarking. My short term goal is usable Desktop usage (X is improved, now we need more RAM).
On the long term, my plan is to add RAMIN/VRAM memory management to the module, have libps3rsx provide the user space API for FIFO control and object instanciation. Also, Xorg driver should use libps3rsx to avoid code duplication. Maybe when can then move slowly from ps3rsx.ko/librsx to drm/libdrm API (but keep exclusive mode option in parallel). Sound good?
Oh, and probably the module should be hosted in ps2dev svn when it is more mature.
mc, there is misprint in headers. NV40TCL_VTX_CACHE_INVALIDATE = 0x1710, not 0x1714. It is good idea to use large ( say 20 megs ) ring buffer for dynamic geometry. You will not care about cache flushing. With small buffers you will die in sync.
Glaurung, solid work. I'll test this module.
> drm/libdrm API (but keep exclusive mode option in parallel). Sound good?
Sound very good. Keep exclusive mode for me :).
I coded high-level vertex and index buffers support. Want to commit soon, with collada export example.
Glaurung, solid work. I'll test this module.
> drm/libdrm API (but keep exclusive mode option in parallel). Sound good?
Sound very good. Keep exclusive mode for me :).
I coded high-level vertex and index buffers support. Want to commit soon, with collada export example.
hi, I commited high-level geometry support. Also some basic collada export. Loading of model looks pretty nice:
also I hardcoded simple transformation shader with matrix4x4 multiplication.
I want to determine the next thing to work on.
I can write nv_fragment ( edit: first nv_vertex ) program assembler. I can insert animation ( vsync, sync with gpu, semaphores ) stuff in library. I can refactor existing code and move all push buffer stuff into library. Select feature :).
Glaurung, excuse me, my ISP blocked MIRC from home.
Code: Select all
int fd, size;
void *file = map_file( "../../data/troll.0.model", &fd, &size );
if( size && file )
{
model_desc_t *model = file;
ptr += set_geometry_source( &model->position, DDR, vb_offset, ptr, Nv3D );
ptr += set_geometry_source( &model->texcoord, DDR, vb_offset, ptr, Nv3D );
ptr += set_index_source( DDR, ib_offset, ptr, Nv3D );
ptr += draw_indexed_primitives( model->indices, 0, model->indices_num, ptr, Nv3D );
unmap_file( file, fd, size );
}
I want to determine the next thing to work on.
I can write nv_fragment ( edit: first nv_vertex ) program assembler. I can insert animation ( vsync, sync with gpu, semaphores ) stuff in library. I can refactor existing code and move all push buffer stuff into library. Select feature :).
Glaurung, excuse me, my ISP blocked MIRC from home.
-
- Posts: 2
- Joined: Thu Nov 29, 2007 7:07 am
Forgive my ignorance, but will OpenGL through MESA or even accelerated dri ever be possible with the API you are working on?IronPeter wrote: I want to determine the next thing to work on.
I can write nv_fragment ( edit: first nv_vertex ) program assembler. I can insert animation ( vsync, sync with gpu, semaphores ) stuff in library. I can refactor existing code and move all push buffer stuff into library. Select feature :).
I'm running the Java-based Integrated Data Viewer (IDV) for research purposes, and it relies on Java3D where GLX version 1.3 or higher is required...
Keep up the good work, it's just a pleasure to check-in to this forum every day to follow this fruitful collaboration!
Cheers, Tyn