My first distributed program (My first convollution in Cell)
Last week I developed my first successful distributed program in Cell. It consist in perform a median filter on raw image data.
I developed two programs. The first one runs only on the PPU, the second one runs in both PPU and SPU. The SPU gives to the SPUs the information needed to perform its operations.
Once the ppm image is loaded, the information stored appears as a raw format: an array of bytes containing the corresponding RGB components of each pixel. The algorithm is quite simple but sometimes it could be difficult to apply SIMD instructions because the raw data has to be prepared for.
In the distributed program the PPU sends to the SPUs the following information:
- Pointer to the original image data.
- With of the original image.
- Heigh of the original image.
- Pointer to the destination image data.
- Initial row to compute.
- Final row to compute.
The SPUs are responsible of getting the data, apply the filter, compute the destintation memory for writting the result, and finally of writting the result.
The PPU loads the original image, reserve the required memory for the destination image, waits the SPUs to complete and at last writes the destination image to verify the algorithm works fine.
The load/store of the images hasn’t been taken in consideration, just the computing time.
Results for 256×196 image size
-
ONLY ON PPU: Time spent [16.79] milliseconds
-
ON PPU AND SPUs: Time spent [3.38] milliseconds 4,96x
Results for 512×512 image size
-
ONLY ON PPU: Time spent [88.59] milliseconds
-
ON PPU AND SPUs: Time spent [11.02] milliseconds 8,03x
6 Comments »
Leave a comment
-
Archives
- May 2009 (1)
- January 2009 (1)
- September 2008 (1)
- August 2008 (1)
- July 2008 (1)
- June 2008 (1)
- March 2008 (1)
- February 2008 (5)
-
Categories
-
RSS
Entries RSS
Comments RSS
Hi,
If it’s not too much to ask, could you email me your code? (lvoicu@aol.com). I’ve been looking for Cell BE sample code for a while to elarn from and could not find anything other than an FFT which does not build on my machine.
Regards,
Liv
Hello Liv, I am sorry but I can’t give you the code at this time. Actually I am working on my dissertation and that code is going to be included as example. I hope to finish my work in a few months.
In the meantime and if I find enought time I will post new experimental results and tutorials. Last months I couldn’t work as much as I expected.
That’s fine. Do you have any knowledge of other Cell sample code out there? Good luck with your dissertation.
Liv
Hello Liv, Mostly I am finding the knowledge in the IBM website, reading cell docs and writing to the forum.
I had DMA errors in processor transfers, the document linked bellow gave me the ligth:
http://heim.ifi.uio.no/~knutm/geilo2008/
There is no complete code, but it is no useless.
Regards,
Javier
Hi Javier,
What’s the size of your median? It seems that your latencies are a bit high.
Regards,
Liv
Hi Liv, the size of the median was 3×3 but with an alignement not optimal. For each frame (loaded image) I created (and distroyed) a thread per spe. The spe code was not optimized.
When I tested the generated executalbe in the real system (a PS3) the time results were up to 8x, so my latencies are a great high.
I have to learn how to syncronice PPE-SPE using messages (I think I could use mailboxes, but I have to have a look at it).
My code is useless
, so if you still want may code, tell me and I send you it by email.
Regards,
Javi.