uClinux on the PSP

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

uClinux on the PSP

Post by chrismulhearn »

Hey, I ported Xiptech's mipsnommu version of uClinux-2.4.19 to the PSP. The only hardware support it has currently is the headphone jack's serial port (used by the console and tty implementation), but its a start!

Check it out here: http://df38.dot5hosting.com/~remember/chris/

It mounts a ramdisk as the root filesystem, the root disk image is linked in with the kernel. The disk image has a minimal userland including sh, ls, mkdir, echo, cat, basic stuff like that, built with uClibC. All the executables are statically-linked.

The only way to use it is with some sort of serial port hardware like discussed here: http://forums.ps2dev.org/viewtopic.php?t=5234

Hopefully more people will help now and we can really turn this into something.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Good work to start.

however I have a global look on the source you provided and there is one thing that puzzles me, there is no specific psp architecture directory. I think you should consider adding a psp architecture directory to avoid pollute the generic one.

Just one example, cache operations : cacheops.h defines a lot of operation codes which don't match those of psp cache instruction. For instance, code 0x08 is 'hit invalidate icache' on psp, not 'index store tag icache'.

Due to the fact the hardware part (speaking about hardware registers) of psp is quite unknown, it would be a big hassle to implement.

that said, alternatively we can consider to run ucLinux on ME processor too (having SC processor to provide to ME processor some devices functionalities to start).

well, I cheer you up.
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

Post by chrismulhearn »

dude this is just to get the ball rolling. When you start a new port of the kernel, you typically hack up the most similar existing port to get started.

But you are right, ultimately my hacked up mipsnommu/simulator/ will be moved into mipsnommu/psp/ Of course if you've ever done anything with the kernel source, you'll know that the architecture specific code does not stay entirely in arch/x/y/. So there is lots of code plunked in various places that needs to be wrapped in an #ifdef CONFIG_PSP.

But that is a mundane detail that we'll get around to eventually. It'll be easy to use a visual diff tool to see where the modifications were made, and where they may be better placed. Keep in mind this was very much a learning experience for me, so all kinds of bad design choices were made in the beginning. For example, I opted out of learning how to use the PSP cache by just linking the kernel to run in 0xAxxx,xxxx range, the uncached memory segment.

Thanks!
willow :--)
Posts: 107
Joined: Sat Jan 13, 2007 11:50 am

Post by willow :--) »

It looks impressive.
I'd like to know what you would use Linux on a psp for ?
I like the fact of "developping stuff just for the sake of it", but porting Linux is a huge task, so I guess you have an idea behind that.

Like, completeley replacing the XMB or whatever...

I don't know the psp enough right now to understand what would be the advantage on using a different OS (Linux) rather than using the standard firmware capbilities to develop, say, homebrew...

I'm not very clear, that's because I think I'm trying to compare two things that cannot be compared, but imagine I want to develop a mp3 player. What would be the advantage of developping this mp3 player for linuxforpsp rather than developping it as a "standard" homebrew ?

Or mayber that's not why you're doing this at all ?
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

Post by chrismulhearn »

Well, mainly it was to learn about operating system kernels.

But there are other uses for bringing a standard OS platform to the PSP. It could perhaps enhance portability, meaning, it may be easier to port existing Linux applications to a PSP linux/libc runtime environment, rather than the XMB/psp-sdk environment.

Then again, maybe not!

All in all it just gives a few more developing options that may or may not be pursued. I imagine developing an interesting keystroke entry mechanism [similar to the T9Word feature on my cell phone] for it, and running Gaim and Firefox.
link
Posts: 61
Joined: Wed Oct 19, 2005 6:17 am

Post by link »

maybe eventually you could have the usb on top have a usb driver, then it would be extremely easy to attach keyboards, mice and what not (do a simple rewiring to a hub or what not) it might be impossible
00100000 01101001 01101101 00100000 01110010 01101001 01100111 01101000 01110100 00100000 01100010 01100101 01101000 01101001 01101110 01100100 00100000 01111001 01101111 01110101 00100001
FreePlay
Posts: 71
Joined: Wed Jan 04, 2006 6:53 pm
Location: Schenectady, New York, USA

Post by FreePlay »

Question:

With the system as you currently have it, is it possible to have primitive graphics support through direct VRAM manipulation, or is the VRAM no longer mapped under Linux (or some other problem)?

By the way, nice work :)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

FreePlay wrote:Question:

With the system as you currently have it, is it possible to have primitive graphics support through direct VRAM manipulation, or is the VRAM no longer mapped under Linux (or some other problem)?

By the way, nice work :)
i'm not sure about this, you may need to setup the LCD controller so it can display VRAM.
FreePlay
Posts: 71
Joined: Wed Jan 04, 2006 6:53 pm
Location: Schenectady, New York, USA

Post by FreePlay »

Figured that might be a problem, too. Something to get Skylark involved with, maybe... he's coded a few linux-mips drivers in the past. Not sure if he's experienced with a nommu system, but we can always check :)
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

framebuffer, cache, etc

Post by chrismulhearn »

I seem to be able to write pixels to the screen from linux just by writing values to VRAM [0xA400,0000 range] This is probably because the screen is already initialized when the bootloader program is loaded by the PSP. Even if we don't know how to change the video mode, we could make a framebuffer device that just supports that particular mode (or, in the bootloader, while the PSP OS is still operating, use SDK functions to setup the screen however we want it.)

I dont really understand how the "framebuffer" abstraction works in terms of eliminating flickering and stuff though, as I recall in old PC programs I'd use "page flipping" to draw to some non-visible portion of VRAM and then "page flip" the video hardware to point that portion of ram once drawing was complete. Anyone who has any info on this subject should post it here. :)

Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new year is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
Last edited by chrismulhearn on Tue Jan 16, 2007 3:41 am, edited 1 time in total.
User avatar
groepaz
Posts: 305
Joined: Thu Sep 01, 2005 7:44 am
Contact:

Post by groepaz »

dont worry about the page flipping stuff, if i recall correctly from gc-linux the linux framebuffer drivers dont directly deal with it at all (i remember trying to put a efb->xfb copy into the pageflipping routine, which i couldnt find at all =P). that said, since the psp has a common pixel color format, porting one of the existing drivers to psp should be rather trivial (except for the setup that is, which isnt really needed for a start like you said - you could always add it later)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: framebuffer, cache, etc

Post by hlide »

chrismulhearn wrote:Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new millenium is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
i will provide you with a cacheops.h updated for allegrex.

there is one point which bothers me : in ME code (executed by the second processor as known as media engine processor) i can see 0x1 and 0x11 as cache oprerations (i dunno if they are really Index_Writeback_Inv_D and Hit_Invalidate_I like in standard Mips r4000 cache). So what to think about it ???? this code appears when you boot ME processor (at adress 0xBFC00040). As far as I know ME processor is an allegrex cpu (because it doesn't complain about m(t/f)ic for instance) without vfpu so i would expect it to have the same cache operations as SC processor.
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Re: framebuffer, cache, etc

Post by crazyc »

hlide wrote:
chrismulhearn wrote:Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new millenium is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
i will provide you with a cacheops.h updated for allegrex.

there is one point which bothers me : in ME code (executed by the second processor as known as media engine processor) i can see 0x1 and 0x11 as cache oprerations (i dunno if they are really Index_Writeback_Inv_D and Hit_Invalidate_I like in standard Mips r4000 cache). So what to think about it ???? this code appears when you boot ME processor (at adress 0xBFC00040). As far as I know ME processor is an allegrex cpu (because it doesn't complain about m(t/f)ic for instance) without vfpu so i would expect it to have the same cache operations as SC processor.
I'm fairly sure 0x01 is "icache index store tag" and 0x11 is "dcache index store tag" and the code in the ME init is just clearing the cache tags.

Here's the notes I took back when I first got code running on the ME. I'm not %100 sure but it's probably accurate and the ops are most likely the same on the SC.

Code: Select all

#define IXILT		0x00	/* Icache index load tag */
#define IXIST		0x01	/* Icache index store tag */
#define IXIINV		0x03	/* Icache index invalidate */
#define IXHINV		0x08	/* Icache hit invalidate */
#define DXILT		0x10	/* Dcache index load tag */
#define DXIST		0x11	/* Dcache index store tag */
#define DXIINV		0x13	/* Dcache index invalidate */
#define DXIWBINV	0x14	/* Dcache index writeback invalidate */
#define DXHINV		0x19	/* Dcache hit invalidate */
#define DXHWB		0x1a	/* Dcache hit writeback */
#define DXHWBINV	0x1b	/* Dcache hit writeback invalidate */

#define TAG_SIZE	128
#define LINE_SIZE	64
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

yes i totally aggree with your guess about 0x01 and 0x11 : ME code first clear Lo and Hi tags before running cache operations, so you must be absolutely right about their meaning.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

GOAL : invalidate, writeback, create exclusive dirty, fill ?
NOTE : only 0x18 to 0x1F are tested

First test on ME processor :
- t0 = cycles spent on this cache operation
- t1 = cycles spent on a load instruction
- t2 = cycles spent on a store instruction

Code: Select all

.p2align 6
me_cache_line:
        .long +0, -1, -2, -3        
        .long -1, -2, -3, -0        
        .long -2, -3, -0, -1        
        .long -3, -0, -1, -2
        
.macro test_cache i
        li              t1, 0x80000000
        mtc0            t1, $11
        li              t0, 0
        mtc0            t0, $9
        
        sw              t1, 0(at) # store the value        
        
        cache           0x1B, 0(at) # dcache hit writeback and invalidate
        
.p2align 6
        mfc0            t0, $9 # time elapsed at t0     
        
        cache           \i, 0(at) # operation code to test 
        
        mfc0            t1, $9 # cycles elapsed at t1       

        lw              v0, 0(at) # read the value
        
        mfc0            t2, $9 # cycles elapsed at t2

        nor             v0, zr, v0 # invert bits of the value
        
        mfc0            t3, $9 # cycles elapsed at t3

        sw              v0, 0(at) # write the negated value back
        
        mfc0            t4, $9 # cycles elapsed at t4

        cache           0x1B, 0(at) # dcache hit writeback and invalidate
        
        lw              v0, 0(at)
        subu            t0, t1, t0
        subu            t1, t2, t1
        subu            t2, t4, t3
         
        sw              v0, 0x00(a0) # store the value        
        sw              t0, 0x04(a0) # store time elapsed
        sw              t1, 0x08(a0) # store time elapsed
        sw              t2, 0x0C(a0) # store time elapsed     
.endm

.global me_test_cache
me_test_cache:
        jal             me_enter_critical_session
        nop

        li              v1, 0xa0000000
        la              at, me_cache_line
        or              a0, a0, v1

        test_cache      0x18
        addiu           a0, a0, 16
        
        test_cache      0x19
        addiu           a0, a0, 16
        
        test_cache      0x1A
        addiu           a0, a0, 16
        
        test_cache      0x1B
        addiu           a0, a0, 16
        
        test_cache      0x1C
        addiu           a0, a0, 16
        
        test_cache      0x1D
        addiu           a0, a0, 16
        
        test_cache      0x1E
        addiu           a0, a0, 16
        
        test_cache      0x1F
        addiu           a0, a0, 16
        
        jal             me_leave_critical_session
        nop
                
0:      b               0b
        nop
results :
  • # 18 -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 19 -> v0 = 7FFFFFFF, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
    # 1A -> v0 = 7FFFFFFF, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
    # 1B -> v0 = 7FFFFFFF, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
    # 1C -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1D -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1E -> v0 = 7FFFFFFF, t0 = 70 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1F -> v0 = 7FFFFFFF, t0 = 70 cycles, t1 = 2 cycles, t2 = 3 cycles
conclusion :
note : because a dcache writeback and invalidate is done just before, we cannot determine if the following one is invalidate/writeback
  • - t0 == 3 cycles ==> no state changing
    - t0 == 4 cycles ==> change state : CREATE DIRTY EXCLUSIVE-like operation
    - t0 == 70 cycles ==> change state, fetching data in main memory : FILL-like operation
    - t1 == 2 cycles ==> no data fetching in main memory
    - t1 == 41 cycles ==> data fetching in main memory
  • - 18, 1C and 1D must be like CREATE DIRTY EXCLUSIVE operations.
    - 1E and 1F must be like FILL operations
Second test on ME processor :
- t0 = cycles spent on this cache operation
- t1 = cycles spent on a load instruction
- t2 = cycles spent on a store instruction

Code: Select all

...        
.macro test_cache i
        li              t1, 0x80000000
        mtc0            t1, $11
        li              t0, 0
        mtc0            t0, $9
        
        sw              t1, 0(at) # store the value        
   
# REMOVE THIS CACHE INSN SO WE CAN DETERMINE THE NEXT ONE :     
#      cache           0x1B, 0(at) # dcache hit writeback and invalidate
        
.p2align 6
        mfc0            t0, $9 # time elapsed at t0     
        
        cache           \i, 0(at) # operation code to test
... 
results :
  • # 18 -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 19 -> v0 = 80000000, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
    # 1A -> v0 = 7FFFFFFF, t0 = 7 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1B -> v0 = 7FFFFFFF, t0 = 7 cycles, t1 = 61 cycles, t2 = 3 cycles
    # 1C -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1D -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1E -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
    # 1F -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
conclusion :
  • v0 == 80000000 ==> the "sw" of 7FFFFFFF from previous test_cache was not written back in main memory : HIT INVALIDATE operation
    t0 == 3 cycles ==> no state changing
    t0 == 4 cycles ==> change state : INVALIDATE/CREATE DIRTY EXCUSIVE/FILL-like operation
    t0 == 7 cycles ==> change state : WRITEBACK-like operation
    t1 == 2 cycles ==> no data fetching in memory
    t1 == 41 cycles ==> data fetching in main memory
    t1 == 61 cycles ==> old data saving (?) + new data fetching in main memory
  • - 18, 1C and 1D are like a CREATE DIRTY EXCLUSIVE operation
    - 19 is a HIT INVALIDATE operation
    - 1A is a HIT WRITEBACK operation
    - 1B is a HIT WRITEBACK AND INVALIDATE operation
    - 1E and 1F are a FILL operation because of the first test
Of course, some were already known because they appear in the kernel, but we have confirmation here.

We still need to determine the difference between 18, 1C and 1D and also between 1E and 1F.

for 1F i know now for sure it is something like FILL AND LOCK because I tested it some months ago and I was able check that we can use it to have a very small but fast memory which never writes back in main memory unless an explicit writeback operation is done. The cache line is locked until an explicit invalidate or unlock operation is done.

I suspect there is also a CREATE DIRTY EXCLUSIVE AND LOCK since it is an allocation for a cache line as FILL is.

in http://www.freepatentsonline.com/20010052053.html you can read some details about LOCK mechanism :
[0187] Cache

3 The CACHE instruction implements the following five operations: 0: Index Invalidate - Instruction Cache 1: Index Write-back Invalidate - Data Cache 5: Index Write-back Invalidate - Data Cache 9: Index Write-back - Data Cache 16: Hit Invalidate - Instruction Cache 17: Hit Invalidate - Data Cache 21: Hit Write-back Invalidate - Data Cache 25: Hit Write-back - Data Cache 28: Fill Lock - Instruction Cache 29: Fill Lock - Data Cache

[0188] The Fill Lock instructions are used to lock the instruction and data caches on a line by line basis. Each line can be locked by utilizing these instructions. The instruction and data caches are four way set associative, but software should guarantee that a maximum of three of the four lines in each set are locked. If all four lines become locked, then one of the lines will be automatically unlocked by hardware the first time a replacement is needed in that set.
I read more details about LOCK mechanism in a MIPS architecture else allegrex but I cannot retrieve this pdf because I don't remember how to do since I found out this document by accident : this document says that a locked cache line can be unlocked by an invalidate or unlock operation. But what is this or those unlock operations ?
Last edited by hlide on Tue Jan 16, 2007 4:26 am, edited 1 time in total.
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

cache

Post by chrismulhearn »

Wow it seems i have a lot to learn about the MIPS cache. Everything I've ever worked with has had an architecturally-invisible cache.

What is the basic concept here? When you try to read a memory location that isn't in the primary cache, does some exception get raised? What are you supposed to do in that situation, perform a CACHE operation that dumps a cache line to memory and replaces that cache line with the memory location you initially tried to read?

MIPS manuals seem to have very lengthy chapters about the cache, but they focus on coherency (which in my case is not an issue since I am using only the Allegrex cpu, my environment is therefore uni-processor) and it is hard to tell which parts are accomplished automagically and which parts are up to the programmer.

If anyone knows enough about this to post a simple example of an exception handler that allows use of the Allegrex icache and dcache in a uni-processor (as in, ignoring the ME) environment, it would be a really big help!
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

basically you probably need just a portion :

- lock mechanism ? just forget about it, it seems Linux doesn't bother with it

- since allegrex has only one primary cache for each purpose (one for instruction and another for data), i don't think we can encounter any coherency problem so no bother here.

- the very fact that allegrex has no TLB, VCE (Virtual Error Coherency, an exception raised when two entries happen to have the same content because they have the same PA (Physical Address) but different VA (Virtual Address) and the cache operation is in conflict if i'm not wrong) should not happen.

Well, i would say there is no reason to think there is an exception to handle for cache coherency in fact so you should be relieved.

Oh sorry I forgot to answer your real question :

Normally this is transparent (that is, user code shouldn't handle it by using directly those cache instructions)

But still, when you do need to use them (indirectly by provided linux functions)
1) when you store some instructions through dcache in main memory and you need to run them though icache : DCACHE WRITEBACK INVALIDATE then ICACHE INVALIDATE at their addresses. Why ? because icache can some wrong instructions at those addresses so you need to invalidate them before running the right one.
2) for DMA or peripheral (hardware off cpu) operations which need to deal with main memory : DCACHE WRITEBACK INVALIDATE.

As you can see there is essentially 2 operations :
- DCACHE WRITEBACK AND INVALIDATE
- ICACHE INVALIDATE

Linux can use CREATE_DIRTY_EXCLUSIVE as a kind of prefetch instruction for data writing. There is a define to enable it.

And they should only be used in a driver code mostly.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

to put bluntly, you probably just need to provide an allegrex-revised cacheops.h (with the right codes) and activate/desactivate some features (http://www.linux-mips.org/wiki/Cpu_features) probably
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

Post by chrismulhearn »

in my experience, its very difficult to diagnose problems if you don't have a sound conceptual understanding of whats supposed to be going on. I was under the impression that there was an exception handling aspect to this, but you seem to imply the only time you really need to use the cache instructions is when you are explicitly doing something that will put the cache in an awkward state [for example, writing instructions to memory, the icache may have already cached the instruction you are trying to overwrite, so without invalidating that cache, you won't see your new instruction.]
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

about VCE :

In my opinion VCE is irrelevant for Allegrex.

usually :
- VCEI : exception #14
- VCED : exception #31

but we have those :

Code: Select all

EXC_31_ERROR_handler(/* v1 */) /* (exceptionman:0x06c8) */
{
    COP0CTRL.7=v1; /* save v1 in cc0.7 (GPR.v1) */
    COP0CTRL.20=COP0STAT.13; /* save (Cause) in cc0.20 */;
    COP0CTRL.1=COP0STAT.30; /* save (ErrorEPC) in cc0.1 Error Exception Program Counter */
    COP0CTRL.19=COP0STAT.12; /* save v1 (Status) in cc0.19 Status register */
    exception_handler&#40;31< <2&#41;; /* v0=0x007c default offset in table */
&#125;
and

Code: Select all

void *ExceptionVectorTable&#91;32&#93; /* 8801ea00 &#40;exceptionman&#41; Exception Vector Table &#40;32 Entries&#41; */
&#123;
/*  0 */ 88020F74 &#40;interruptman&#58;0x2274&#41; /* IRQ &#40;=default_irq_handler&#41; */
/*  1 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  2 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  3 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  4 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  5 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  6 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  7 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/*  8 */ 88021E74 &#40;interruptman&#58;0x3174&#41; /* syscall &#40;=EXC_8_Syscall handler&#41; */
/*  9 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 10 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 11 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 12 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 13 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 14 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 15 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 16 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 17 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 18 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 19 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 20 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 21 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 22 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 23 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 24 */ 8801D130 &#40;hang&#41;while&#40;1&#41;; /* debug exception */
/* 25 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 26 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 27 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 28 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 29 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 30 */ 8801D130 &#40;hang&#41;while&#40;1&#41;;
/* 31 */ 8801D370 &#40;exceptionman&#58;0x0c70&#41; /* error, default &#40;=default_error_handler&#41; */ 
&#125;
my opinion is that exception #14 and #31 don't really exist (irrelevant for Allegrex because there is no TLB), and exception #31 is softwarely reused for unrecoverable error (whereas VCE is not an unrecoverable error).

if so, there is no exception to handle cache trouble.
Last edited by hlide on Tue Jan 16, 2007 4:59 am, edited 1 time in total.
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

Post by chrismulhearn »

That makes sense to me, I thought by "architecture-visible", the Mips manual was stating that the OS kernel would have to do something to handle cache misses. Apparantly thats not the case, so that makes life a lot easier!

Thanks for the help!!!


-Chris
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

Post by chrismulhearn »

OK wow! I'm using the cache now (Linking to 0x8xxx,xxxx instead of 0xAxxx,xxxx) and it is amazingly faster, and it works!

But this is because at the moment, with no device drivers except the serial port tty [which explicitly uses uncached memory segment 0xAxxx,xxxx] I'm not doing anything where the cache could end up with incorrect values.

So now onto cacheops.h, etc.

It seems to be pretty messy and hard to understand in the lower levels of the cache code, and enough #ifdefs to make my head spin, so I'd rather just write my own cache flushing routines and map them to the _flush_cache_xxx stubs. In order to do this, I basically need to write:

flush_dcache_all() {
// data cache flush
// perform all pending writebacks and invalidate the whole cache.
}

and

flush_icache_all() {
// instruction cache flush
// perform all pending writebacks and invalidate the whole cache.
}

Now, it won't be as efficient as if i wrote the flush_page() and flush_range() functions, but it will work, and for now thats all I care about.

So..... how do I implement those? hahaha.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

look at asm/mips32_cache.h : functions uses defines from cacheops.h so if you map the correct value for each needed defines used in mips32_cache.h most of cache job is already done.
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

Post by chrismulhearn »

have you done kernel work in the past? Thanks for all the help.. I've got a few more questions if you don't mind:

1. why does this work to invalidate the _whole_ instruction cache?

static inline void blast_icache(void)
{
unsigned long start = KSEG0;
unsigned long end = (start + icache_size);

while(start < end) {
cache_unroll(start,Index_Invalidate_I);
start += ic_lsize;
}
}

cache_unroll is defined as:

#define cache_unroll(base, op)
asm("cache %1, (%0)"
:
: "r" (base) // thats %0
"i" (op) // thats %1
);

Is it because the "base" in this context actually refers to a particular cache line, rather than a particular physical address? In that case, why does it start at KSEG0 (0x8000,0000) instead of starting at 0?

In the r4000 manual, the meaning of this "index" value is based on the cache line length. Do we know what the cache line length is on the PSP?

From the r4000 manual: "For a primary cache of 2^(CACHEBITS) bytes with 2^(LINEBITS) bytes per tag, vAddr (note- I guess vAddr refers to what i am calling "base" in this discussion) vAddr(bit CACHEBITS ... bit LINEBITS) specifies the block. "

Kind of odd notation here, but ok for a 64kbyte cache, CACHEBITS would equal 16 (that satisfies 2^CACHEBITS = 64k bytes) .

So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?

Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)

Is any of this making sense to you? What are your thoughts?
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

chrismulhearn wrote: So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?
I was curious too, so a few minutes of looking turned up this:
I'm not sure what you mean by TLB translations required for hit cacheops.
If you mean the Index Writeback or Index Invalidate functions, note that
you can (and should) use a kseg0 address to do this. This bypasses
the TLB, while still giving you the index that you want. We simply
OR the kseg0 base address into the index that we've calculated and
use that as the argument to the CACHE instruction. There's actually
words to this effect in the MIPS32/MIPS64 spec, but it is, perhaps,
not clear enough.
Seems CPUs with an MMU save a TLB lookup this way.
chrismulhearn wrote:Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)

Is any of this making sense to you? What are your thoughts?
Cache lines are 64 bytes.
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

psp cache size + instructions

Post by chrismulhearn »

awesome detective work crazyc.

How big are the instruction + data caches on the PSP, by the way? Do we know for sure? google seemed to turn up 32k i, 64k d

Also, how did you figure out that:
0x03 = icache indexed invalidate
0x14 = dcache indexed writeback+invalidate

Those are actually the only two instructions I'm using right now, in a flush_all_dcache() and flush_all_icache() functions that loopsthrough the entire cache. I'll let you know if it works.

thanks for the help everyone
chrismulhearn
Posts: 80
Joined: Wed Feb 22, 2006 4:43 am

cache flushing in uClinux

Post by chrismulhearn »

OK so I wrote code that assumed both caches were 64K. Now, I noticed that as soon as I started running kernel + programs in a cached memory segment (instead of 0xAxxx,xxxx where _everything_ was uncached) when I invoked programs from the shell, weird things would happen every now and then, programs would crash... the first time i ran them. But then the second time they'd work. Even though they were the exact same program being loaded into the exact same memory location over and over.

Naturally I thought "well its because I'm not flushing the caches, when I load the user program into memory, the kernel isn't flushing the icache, so who knows what could be lingering in there."

So I wrote my cache flushing functions, and noticed that the problem was still there. But I also noticed that when the linux kernel would load an executable (and by "load" i mean copy off of the ramdisk into a different spot it allocated for it) it would ONLY flush the instruction cache. But if we dont flush the data cache, then some program code could still be sitting in the data cache, and the instruction cache would come along and scoop old program code out of RAM...

so I thought, "maybe I'll just explicity flush the data cache first, any time anyone flushes the instruction cache." And that solved the problem.

Weird huh? Maybe the kernel is assuming theres some "coherency" between those two caches [because in this case, there definitely isn't.] ?
TyRaNiD
Posts: 907
Joined: Sun Jan 18, 2004 12:23 am

Post by TyRaNiD »

One thing I noticed (which probably is unlikely to matter) is you need to be careful of aliasing between ksegX addresses and usegX ones, by that I mean if you write to kseg or vice versa and do not flush the cache you cannot be certain that this will be reflected on the otherside.

i.e. _sw(0x12345678, 0x884000000) x = _lw(0x08400000); even though this is the same physical address x probably wont be set to the value you expect.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

chrismulhearn wrote:have you done kernel work in the past? Thanks for all the help.. I've got a few more questions if you don't mind:

1. why does this work to invalidate the _whole_ instruction cache?

static inline void blast_icache(void)
{
unsigned long start = KSEG0;
unsigned long end = (start + icache_size);

while(start < end) {
cache_unroll(start,Index_Invalidate_I);
start += ic_lsize;
}
}

cache_unroll is defined as:

#define cache_unroll(base, op)
asm("cache %1, (%0)"
:
: "r" (base) // thats %0
"i" (op) // thats %1
);

Is it because the "base" in this context actually refers to a particular cache line, rather than a particular physical address? In that case, why does it start at KSEG0 (0x8000,0000) instead of starting at 0?

In the r4000 manual, the meaning of this "index" value is based on the cache line length. Do we know what the cache line length is on the PSP?

From the r4000 manual: "For a primary cache of 2^(CACHEBITS) bytes with 2^(LINEBITS) bytes per tag, vAddr (note- I guess vAddr refers to what i am calling "base" in this discussion) vAddr(bit CACHEBITS ... bit LINEBITS) specifies the block. "

Kind of odd notation here, but ok for a 64kbyte cache, CACHEBITS would equal 16 (that satisfies 2^CACHEBITS = 64k bytes) .

So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?

Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)

Is any of this making sense to you? What are your thoughts?
index based cache operation : you don't give an address exactly because you want to flush this address but flush all addresses which have the same index in cache lines : index = (address & INDEX_MASK) >> INDEX_SHIFT.

address based cache operation (HIT) : you do give an exact address because you want to flush this address and not another address which have the same index in cache lines : index = (address & INDEX_MASK) >> INDEX_SHIFT.

So far as i know, psp have 16KB ICACHE and 16KB DCACHE:

since a cache line is 64 byte-long, you have a total of 256 blocks for a 2-way set, so a maximum of 128 indexes.

so for a global flush, you may only need to do 128 operation for each index with an address 0, 64, 128, ..., 16320.

the fact that it is KSEG0 or KUSEG shouldn't matter because not all the bits of an address should be taken in account, i'm pretty sure not more than 24 bits are taken in account for the index and the tag. What it is important here is the physical address not the virtual address. Whatever its segment (mapped, cached unmapped, uncached unmapped), the physical address is the same and i'm pretty sure the cache used by allegrex always handle a physical address.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

crazyc wrote:
chrismulhearn wrote: I was curious too, so a few minutes of looking turned up this:
I'm not sure what you mean by TLB translations required for hit cacheops.
If you mean the Index Writeback or Index Invalidate functions, note that
you can (and should) use a kseg0 address to do this. This bypasses
the TLB, while still giving you the index that you want. We simply
OR the kseg0 base address into the index that we've calculated and
use that as the argument to the CACHE instruction. There's actually
words to this effect in the MIPS32/MIPS64 spec, but it is, perhaps,
not clear enough.
Seems CPUs with an MMU save a TLB lookup this way.
Of course ! KUSEG is a MAPPED segment (that is under a TLB control) whereas KSEG0 is a CACHED UMAPPED segment (not under TLB control).

some MIPS having an TLB MMU saves partially according a virtual address if i'm not wrong.
Post Reply