cell programming

Just another WordPress.com weblog

My first distributed program (My first convollution in Cell)

Last week I developed my first successful distributed program in Cell. It consist in perform a median filter on raw image data.

I developed two programs. The first one runs only on the PPU, the second one runs in both PPU and SPU. The SPU gives to the SPUs the information needed to perform its operations.

Once the ppm image is loaded, the information stored appears as a raw format: an array of bytes containing the corresponding RGB components of each pixel. The algorithm is quite simple but sometimes it could be difficult to apply SIMD instructions because the raw data has to be prepared for.

In the distributed program the PPU sends to the SPUs the following information:

  1. Pointer to the original image data.
  2. With of the original image.
  3. Heigh of the original image.
  4. Pointer to the destination image data.
  5. Initial row to compute.
  6. Final row to compute.

The SPUs are responsible of getting the data, apply the filter, compute the destintation memory for writting the result, and finally of writting the result.

The PPU loads the original image, reserve the required memory for the destination image, waits the SPUs to complete and at last writes the destination image to verify the algorithm works fine.

The load/store of the images hasn’t been taken in consideration, just the computing time.

Results for 256×196 image size

  • ONLY ON PPU:     Time spent [16.79] milliseconds
  • ON PPU AND SPUs: Time spent [3.38] milliseconds               4,96x

Results for 512×512 image size

  • ONLY ON PPU:     Time spent [88.59] milliseconds
  • ON PPU AND SPUs: Time spent [11.02] milliseconds               8,03x

March 16, 2008 - Posted by Javier Sánchez Egido | Application, Cell | | 6 Comments

6 Comments »

  1. Hi,

    If it’s not too much to ask, could you email me your code? (lvoicu@aol.com). I’ve been looking for Cell BE sample code for a while to elarn from and could not find anything other than an FFT which does not build on my machine.

    Regards,
    Liv

    Comment by Liv | May 20, 2008 | Reply

  2. Hello Liv, I am sorry but I can’t give you the code at this time. Actually I am working on my dissertation and that code is going to be included as example. I hope to finish my work in a few months.
    In the meantime and if I find enought time I will post new experimental results and tutorials. Last months I couldn’t work as much as I expected.

    Comment by jsegido | May 21, 2008 | Reply

  3. That’s fine. Do you have any knowledge of other Cell sample code out there? Good luck with your dissertation.

    Liv

    Comment by Liv | May 21, 2008 | Reply

  4. Hello Liv, Mostly I am finding the knowledge in the IBM website, reading cell docs and writing to the forum.
    I had DMA errors in processor transfers, the document linked bellow gave me the ligth:

    http://heim.ifi.uio.no/~knutm/geilo2008/

    There is no complete code, but it is no useless.

    Regards,
    Javier

    Comment by jsegido | May 24, 2008 | Reply

  5. Hi Javier,

    What’s the size of your median? It seems that your latencies are a bit high.

    Regards,
    Liv

    Comment by Liv | July 2, 2008 | Reply

  6. Hi Liv, the size of the median was 3×3 but with an alignement not optimal. For each frame (loaded image) I created (and distroyed) a thread per spe. The spe code was not optimized.

    When I tested the generated executalbe in the real system (a PS3) the time results were up to 8x, so my latencies are a great high.

    I have to learn how to syncronice PPE-SPE using messages (I think I could use mailboxes, but I have to have a look at it).

    My code is useless :-( , so if you still want may code, tell me and I send you it by email.

    Regards,
    Javi.

    Comment by Javier Sánchez Egido | July 18, 2008 | Reply


Leave a comment