Page 1 of 1
					
				GIF_FIFO instead of DMA
				Posted: Fri Dec 24, 2004 5:30 am
				by Shine
				I've found a nice article at 
http://www.research.scea.com/research/p ... DC2001.pdf , which says in chapter 3, that you can use GIF_FIFO to write data to the GS without DMA. Searching in the PCSX2 sources (
http://www.pcsx2.net/downloads.php) reveals the address. Now I can use it like this:
Code: Select all
#define GIF_FIFO ((volatile u128 *)0x10006000)
*GIF_FIFO = gifTag;
*GIF_FIFO = otherData;
I've timed the starsim demo like this:
Code: Select all
#define T0_COUNT ((volatile unsigned int*)0x10000000)
#define T0_MODE ((volatile unsigned int*)0x10000010)
*T0_MODE = 0;  // stop counting
*T0_COUNT = 0;  // reset counter
*T0_MODE = 128 + 2;  // start (128) + counting bus clocks / 256
for (int i = 0; i < 50; i++) {
  // code to profile
}
int count = *T0_COUNT;
nprintf("us: %i\n", (int)(((float)count)*(256000.0/147456.0)));
and the DMA version needs 45 ms, but writing to GIF_FIFO, without constructing the rectangle list in memory first, needs only 42 ms. If you have much precalculated data, the DMA method may be faster (especially if you run code in the Scratchpad (see chapter 7 in the Sony article), while the DMA transfer runs), but if you always need to calculate every point, writing to the GIF_FIFO may be faster and easier to code. What do you think?
But in the article from Sony it is not recommended to use GIF_FIFO, because it can be changed in future versions of Playstation. How can I setup the memory map myself or is there a syscall or something else to query the memory map?
 
			
					
				
				Posted: Fri Dec 24, 2004 7:42 am
				by ooPo
				I would imagine that the main advantage of using dma would be the ability to do something else during the transfer, instead of just feeding the fifo.
			 
			
					
				
				Posted: Fri Dec 24, 2004 7:57 am
				by Shine
				ooPo wrote:I would imagine that the main advantage of using dma would be the ability to do something else during the transfer, instead of just feeding the fifo.
but in programs, which calculates all GS commands every time again, I can't see any advantage writing it first to memory and then using DMA. With DMA you need to flush the cache anyway, so I don't think there is any speed advantage. But perhaps I'm wrong, I'm a beginner in PS2 developing.
 
			
					
				
				Posted: Fri Dec 24, 2004 9:06 am
				by blackdroid
				Has nothing to do with PS2, a dma controller is there to help move data from one place to another, so you dont have to sit in a tight loop doing read/writes, but instead doing other stuff, like calculating the necessary matrixes for the next frame or whatever. besides eventually you will move away from writing straight from EEmem to GIF and let the VIF1 and VU1 do that job.
You know that you can write to uncached mem btw ? so no need to flush, there are also ways to flush only the parts that you actually touched, wich is faster than using the syscall.
			 
			
					
				
				Posted: Fri Dec 24, 2004 9:19 am
				by ooPo
				As an aside, is the GIF_FIFO what the dma controller uses when you actually send data to the gif? Are there other fifos available for other devices?
			 
			
					
				
				Posted: Fri Dec 24, 2004 9:25 am
				by J.F.
				The speedup with DMA comes from not waiting on the GS. If you make a tight loop of storing values to the fifo that is long enough, you'll eventually outpace the GS and have to wait to send more data. By putting all the data in memory and then using DMA, the DMA controller does the waiting instead of the CPU.
			 
			
					
				
				Posted: Fri Dec 24, 2004 6:02 pm
				by blackdroid
				ooPo wrote:As an aside, is the GIF_FIFO what the dma controller uses when you actually send data to the gif? Are there other fifos available for other devices?
GIF Fifo isnt really documented in detail, it could be a memmapped dma register, that when gets written to the dmac takes the value and sends it to GIF.
 
			
					
				
				Posted: Sat Dec 25, 2004 1:51 am
				by ooPo
				I meant more of when you send data via dma to the gif, it actually sends it to the gif_fifo itself.
			 
			
					
				
				Posted: Sat Dec 25, 2004 4:04 am
				by blackdroid
				Uhm yes GIF has a 256 byte FIFO that the dmac writes the data to, its all in the GS manual.
SIF0/1/2 also has a FIFO wich is shared, dont remember the size.
VIF1 also has a FIFO ( anyone remember VIF_NOP to flush FIFO ? ) 256 byte ( 16 qwords ).
there are other FIFO's aswel like the MemoryFIFO.. 
But now back to the xmas food.. laters.
			 
			
					
				
				Posted: Tue Dec 28, 2004 3:18 pm
				by Shine
				blackdroid wrote:
GIF Fifo isnt really documented in detail, it could be a memmapped dma register, that when gets written to the dmac takes the value and sends it to GIF.
At least I've found the address documented in the eeuser_e.pdf at page 23. But if you write to fast to the GIF_FIFO, the internal buffer (16x16 qwords) can overflow, with the result that additional writes are ignored. I've written a small assembler function, which checks this and with which you can write all GS registers very easy without DMA transfer or thinking about how to setup the right GIF tag:
Code: Select all
.globl SetGSReg
#------------------------------------------------------------------------
# void SetGSReg(u16 reg, u64 data);
#------------------------------------------------------------------------
.align 7
.ent SetGSReg
SetGSReg:
# wait for empty FIFO
	lui	v1, 0x1000
	lui	a3, 0x1f00
	ori	v1, v1, 0x3020
	nop
WaitF:	lw	v0, 0x0000(v1)
	and	v0, v0, a3
	nop	# filling with nops to avoid
	nop	#   "Loop length is too short for r5900."
	nop	#   warning
	bne	v0, zero, WaitF
	nop	# branch delay slot command
	
# load GIF tag: 0x000000000000000e 1000000000008001: NLOOP=1, EOP=1, NREG=1, REGS0=GIF_AD
	li	v1, 0x000e
	lui	v0, 0x1000
	dsll32	v0, v0, 0
	ori	v0, v0, 0x8001
	pcpyld	v1, v1, v0
# load GIF_FIFO (0x10006000) register
	lui	v0, 0x1000
	ori	v0, v0, 0x6000
    
# save GIF tag
	sq	v1, 0x0000(v0)
    
# save reg and data
	pcpyld	v1, a0, a1
	sq	v1, 0x0000(v0)
	jr	ra
	nop
.end SetGSReg
Of course, this is not a good idea, if you want to show millions of triangles, but I've 
changed the starsim program and drawing the stars 50 times needs only 55 ms, so it is fast enough for simple applications and the code is much easier to understand without all the optimizations for newbies like me :-)
 
			
					
				
				Posted: Tue Dec 28, 2004 8:13 pm
				by blackdroid
				yes 16x16 usually gets rounded to 256 bytes :)
you know that your "simple" code essentially skips one giftag and 3 register writes for dealing with a dma send :)
num qwords -> dma_channel_qwc_register
start addr -> dma_channel_madr_register
kick -> dma_channel_ctrl_register
but I guess testing stuff by writing to the FIFO eliminates a potiential dma error.
that being said, the dmac on the ps2 is bloody easy.