SPU initiated DMA with libspe2

Investigation into how Linux on the PS3 might lead to homebrew development.

Moderators: cheriff, emoon

Post Reply
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

SPU initiated DMA with libspe2

Post by ralferoo »

I've been scratching my head for hours on what I'm missing here... Basically, I'm trying some basic tests based on the IBM tutorial documents, but attempting to "port" them to libspe2. Yet, for some reason any attempt to perform DMA fails miserably.

Here's a makefile:

Code: Select all

all:
        spu-gcc spe_distance.c -o spe_distance
        embedspu calculate_distance_handle spe_distance spe_distance_csf.o
        gcc ppe_distance.c spe_distance_csf.o -L/usr/lib -lspe -o distance
        gcc ppe_distance_spe2.c spe_distance_csf.o -L/usr/lib -lspe2 -o distance2
The SPU code is a very basic DMA example, it just DMAs a structure from the PPU, multiplies two numbers from the structure, updates it and DMAs it back again:

Code: Select all

//Pull in DMA commands
#include <spu_mfcio.h>

//Struct for communication with the PPE
typedef struct &#123;
        float speed;     //input parameter
        float num_hours; //input parameter
        float distance;  //output parameter
        float padding;   //pad the struct a multiple of 16 bytes
&#125; program_data;

int main&#40;unsigned long long spe_id, unsigned long long program_data_ea, unsigned

long long env&#41; &#123;
        program_data pd __attribute__&#40;&#40;aligned&#40;16&#41;&#41;&#41;;

        int tag_id = 0;

        //READ DATA IN
        //Initiate copy
        mfc_get&#40;&pd, program_data_ea, sizeof&#40;pd&#41;, tag_id, 0, 0&#41;;
        //Wait for completion
        mfc_write_tag_mask&#40;1<<tag_id&#41;;
        mfc_read_tag_status_any&#40;&#41;;

        //PROCESS DATA
        pd.distance = pd.speed * pd.num_hours;

        //WRITE RESULTS OUT
        //Initiate copy
        mfc_put&#40;&pd, program_data_ea, sizeof&#40;program_data&#41;, tag_id, 0, 0&#41;;
        //Wait for completion
        mfc_write_tag_mask&#40;1<<tag_id&#41;;
        mfc_read_tag_status_any&#40;&#41;;
        return 0;
&#125;
Finally, the PPU caller is defined like this:

Code: Select all

#include <stdio.h>
#include <libspe.h>

//This global is for the SPE program code itself.  It will be created by
//the embedspu program.
extern spe_program_handle_t calculate_distance_handle;

//This struct is used for input/output with the SPE task
typedef struct &#123;
        float speed;     //input parameter
        float num_hours; //input parameter
        float distance;  //output parameter
        float padding;   //pad the struct a multiple of 16 bytes
&#125; program_data;

int main&#40;&#41; &#123;
        program_data pd __attribute__&#40;&#40;aligned&#40;16&#41;&#41;&#41;;  //aligned for transfer

        //GATHER DATA TO SEND TO SPE
        printf&#40;"Enter the speed at which your car is travelling in miles/hr&#58; "&#41;;
        scanf&#40;"%f", &pd.speed&#41;;
        printf&#40;"Enter the number of hours you have been driving at that speed&#58; "
&#41;;
        scanf&#40;"%f", &pd.num_hours&#41;;

        //USE THE SPE TO PROCESS THE DATA
        //Create SPE Task
        speid_t spe_id = spe_create_thread&#40;0, &calculate_distance_handle, &pd, N
ULL,
         -1, 0&#41;;
        //Check For Errors
        if&#40;spe_id == 0&#41; &#123;
                fprintf&#40;stderr, "Error creating SPE thread!\n"&#41;;
                return 1;
        &#125;
        //Wait For Completion
        spe_wait&#40;spe_id, NULL, 0&#41;;

        //FORMAT THE RESULTS FOR DISPLAY
        printf&#40;"The distance travelled is %f miles.\n", pd.distance&#41;;
        return 0;
&#125;
So far, nothing so interesting as it's exactly the same as the IBM example (except for adding a makefile).

Now, if I modify the PPU code to use libspe2.h by changing the header file line and the block between "USE THE SPE TO PROCESS THE DATA" and "FORMAT THE RESULTS FOR DISPLAY" to:

Code: Select all

        unsigned int          createflags = 0;
        unsigned int          runflags    = 0;
        unsigned int          entry       = SPE_DEFAULT_ENTRY;

        spe_context_ptr_t spe = spe_context_create&#40;createflags, NULL&#41;;
        spe_stop_info_t stop_info;

        spe_program_load&#40;spe, &calculate_distance_handle&#41;;
        spe_context_run&#40;spe, &entry, runflags, &pd, NULL, NULL&#41;; //&stop_info&#41;;
        spe_context_destroy&#40;spe&#41;;
The the SPU program is definitely started, but just hangs whenever DMA is attempted. This has happened with all the IBM examples I've tried that used DMA and also on an IBM forum, someone's asking how to get DMA to work with libspe2 (with no answer) so clearly it's not just me having the problem.

The problem doesn't seem so much to be the initiation of the DMA (although it's hard for me to check that), more that the call to mfc_read_tag_status_any() never returns.

I notice in tweakoz's spurast, he's forced compilation in 32-bit mode and written a mfc_put32 method. Somehow, this feels like the wrong solution though and when I tried to use -m32 and write an equivalent mfc_get32 function, that too exhibited exactly the same behaviour. It's interesting to note also that tweakoz never calls mfc_read_tag_status_any() (which is the method that hangs) but instead seems to do syncronisation solely with the PPU and by using spu_read_signal1() instead. This too feels wrong, and I'm not sure what would happen when the SPU is initiates too many DMA requests for the buffer. My guess is that some of the requests would just be silently lost.

So, basically, does anyone have any ideas? I'm starting to wonder if I should just stick to using libspe, even though I know it's on its way out...

(and apologies for the long first post!)
carlosn
Posts: 38
Joined: Thu Mar 10, 2005 2:14 am
Location: Orlando, Florida, US
Contact:

Post by carlosn »

look at this site
http://www.stolk.org/ps3/

The sample there is actually a port of one the samples from the CELLSDK. He ported it to use libspe2 and he's doing DMA transfers. I tested the sample and it compiles and that's what it claims.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

carlosn wrote:The sample there is actually a port of one the samples from the CELLSDK. He ported it to use libspe2 and he's doing DMA transfers. I tested the sample and it compiles and that's what it claims.
I found that page before posting, but it's not a lot of use as that DMA magic is hidden in macros (e.g. UPDATE_LOCAL) which are not given on the page he describes. He doesn't provide a link to the source he bases his example on, so I did a google code search for them but came up with nothing.

Ultimately, I gave up on trying with libspe2 and plumped with libspe and moved onto the SPE programming, which is what I was actually interested in... The results are at http://www.ranulf.net/ps3/julia.tgz :)
carlosn
Posts: 38
Joined: Thu Mar 10, 2005 2:14 am
Location: Orlando, Florida, US
Contact:

Post by carlosn »

ralferoo wrote:I found that page before posting, but it's not a lot of use as that DMA magic is hidden in macros (e.g. UPDATE_LOCAL) which are not given on the page he describes. He doesn't provide a link to the source he bases his example on, so I did a google code search for them but came up with nothing.
The DMA magic you're referring to is under decl.h

#define CESOF_TAG 27
#define UPDATE_LOCAL(s) {\
mfc_get(_LOCAL_##s, _EAR_##s, sizeof(_LOCAL_##s), CESOF_TAG, 0, 0); \
mfc_write_tag_mask(1<<CESOF_TAG);\
mfc_read_tag_status_all();\
}

#define UPDATE_REMOTE(s) {\
mfc_put(_LOCAL_##s, _EAR_##s, sizeof(_LOCAL_##s), CESOF_TAG, 0, 0); \
mfc_write_tag_mask(1<<CESOF_TAG);\
mfc_read_tag_status_all();\

The orginal source code is under
/opt/ibm/cell-sdk/prototype/src/samples/cesof

I am interested in doing SPE coding as well and I want to start using libspe2.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

carlosn wrote:The DMA magic you're referring to is under decl.h ...snip...
That's basically the same as I was trying. I'll have another go at getting it to work at the weekend...
tweakoz
Posts: 21
Joined: Tue Feb 17, 2004 10:51 am
Location: Santa Cruz, CA
Contact:

see my demo for spu initiated dma (using libspe2)

Post by tweakoz »

okonomiyonda
Posts: 2
Joined: Sun Apr 15, 2007 12:41 am
Location: orlando

oops

Post by okonomiyonda »

oops, replied to the wrong post. please delete this
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

This non-post reminded me... I read somewhere on some IBM tech forum that the problem is that the high and low 32 bits of argv and arpg are swapped round with the early version of libspe2 and fixed in later versions. I've not had time yet to test if this indeed was the cause of my problems, but it might be interesting for others.

One solution would be to pass the DMA address in put the channel, another would be to explicitly refer to an EA and have the linker fix up the addresses automatically.
okonomiyonda
Posts: 2
Joined: Sun Apr 15, 2007 12:41 am
Location: orlando

Post by okonomiyonda »

I am running version 2.0 ( since I have yet to get yellowdog's glib updated and working with 2.1 ) and I have no problem with swapped words in argp.

I did have the exact same problem that the original poster was talking about. Unfortunately I fixed the problem before I really understood what I did wrong. If you really need to get up and running fast, you can do this. Modify the cesof example to use lib version 2.0 by editing the makefile to import -libspe2 instead of libspe. Then remove the #include "libspe.h" in the ppu's .c file and replace it with #include "libspe2.h". From there, just replace the old calls with the new versions. If I remember properly, all I had to do was change the way you kick off the spu program.

If that works for you, you can copy the cesof directory and use it as a base for another project. As far as why you're getting the problem in the first place, it could be a number of things. Make sure the size of your DMA transfer is right, although if that were the problem, you'd probably see a bus error. Most importantly, verify that the effective address you're DMA-ing from is valid and matches up to whats you see in your PPU program. Also, try making the data you want to DMA global and aligned on a 128 border.

Oh and one last thing... in your SPU program your arguments to main look a little suspect. I dont have any sample code in front of me but off the top of my head, I thought it was supposed to be

int main (long long spuid , char** argp, char** envp );

I'll try again tonight to break my old code and find some potential stalls in the example you posted above
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

okonomiyonda wrote:I did have the exact same problem that the original poster was talking about.
I am the original poster... :o
Oh and one last thing... in your SPU program your arguments to main look a little suspect. I dont have any sample code in front of me but off the top of my head, I thought it was supposed to be

int main (long long spuid , char** argp, char** envp );
Well, possibly that could be the problem; certainly in libspe, the arguments were defined as long long (64 bit) whereas char** is 32 bit on the SPE. As I'm using argp as the base transfer address and NULL for argv, this would indeed result in the words appearing being swapped.

That said, I was also using quite an early version of libspe2, built from source, so it's quite possible it doesn't match the version you have.
I'll try again tonight to break my old code and find some potential stalls in the example you posted above
Probably not that worthwhile, now that I know it should be fixed it later releases. I ended up putting all my libspe stuff into a tiny compatibility library (literally just hiding the difference between using pthreads / polling on the different channels within one thread), so I'm not really fussed what library I use anymore as long as it works!

[edit] Just happened to come across the IBM bug report and solution, so thought I'd add that information: http://www-128.ibm.com/developerworks/f ... 9#13931389
Last edited by ralferoo on Fri May 04, 2007 2:49 am, edited 1 time in total.
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

Make shure all dma adress pointers are of type unsigned long long in case you are working on a 32 or 64 bit userland to enshure compatibility
Don't do it alone.
ralferoo
Posts: 122
Joined: Sat Mar 03, 2007 9:14 am
Contact:

Post by ralferoo »

unsolo wrote:Make shure all dma adress pointers are of type unsigned long long in case you are working on a 32 or 64 bit userland to enshure compatibility
Actually, it doesn't matter too much.

All SPE address are 32 bit (actually 18 bits) and there are optimisations when setting up DMA if you know that the DMA is in the lower 32-bit range of the PPE memory space (userland address, not after address translation). Plus, DMA lists only allow you to change the lower 32 bits of addresses.

It's probably easiest for most applications to build with -m32 or when mmaping memory to specify MAP_32BIT so that you know all memory regions will be 32-bit safe. Given the tiny memory on the PS3, if people are writing programs that need more than 2-Gb of userland data, they're probably doing something wrong... ;)
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

Keep in mind that you can remap memory and devices more or less anywhere using the hypervisor..

All adress calls from spu is 64 bit adresses..

Write code so it works with both -m32 and -m64 this only comes down to how you handle adresses..
Don't do it alone.
Post Reply