<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Visual Core &#187; Parallel</title>
	<atom:link href="http://visualcore.com/index.php/tag/parallel/feed/" rel="self" type="application/rss+xml" />
	<link>http://visualcore.com</link>
	<description>An amazing repository of useless junk</description>
	<lastBuildDate>Sat, 18 Jun 2011 05:12:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>.NET 4.0: Parallel Programming</title>
		<link>http://visualcore.com/index.php/2009/06/net-4-0-parallel-programming/</link>
		<comments>http://visualcore.com/index.php/2009/06/net-4-0-parallel-programming/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 06:35:34 +0000</pubDate>
		<dc:creator>Jeremy Cowles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Parallel]]></category>
		<category><![CDATA[VB.NET]]></category>

		<guid isPermaLink="false">http://visualcore.com/wp/?p=32</guid>
		<description><![CDATA[.NET 4.0 Beta 1 contains some interesting parallel programming constructs. Sounds like a great idea, I&#8217;m really interested to see where this is going.
]]></description>
			<content:encoded><![CDATA[<p>.NET 4.0 Beta 1 contains some interesting <a href="http://code.msdn.microsoft.com/ParExtSamples" target="_BLANK">parallel programming</a> constructs. Sounds like a great idea, I&#8217;m really interested to see where this is going.</p>
]]></content:encoded>
			<wfw:commentRss>http://visualcore.com/index.php/2009/06/net-4-0-parallel-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PyMW: Week one</title>
		<link>http://visualcore.com/index.php/2009/06/pymw-week-one/</link>
		<comments>http://visualcore.com/index.php/2009/06/pymw-week-one/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 19:01:36 +0000</pubDate>
		<dc:creator>Jeremy Cowles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BOINC]]></category>
		<category><![CDATA[GSoC]]></category>
		<category><![CDATA[Parallel]]></category>
		<category><![CDATA[PyMW]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://visualcore.com/wp/?p=34</guid>
		<description><![CDATA[Today is the official end of my 7th day of working on the PyMW interface for BOINC for Google Summer of Code.
It took me three days to get the my first PyMW app to run (monte_pi.py), which ran on 4 virtual nodes (4 tasks). More than four tasks was causing problems, it turned out that [...]]]></description>
			<content:encoded><![CDATA[<p>Today is the official end of my 7th day of working on the PyMW interface for BOINC for Google Summer of Code.</p>
<p>It took me three days to get the my first PyMW app to run (monte_pi.py), which ran on 4 virtual nodes (4 tasks). More than four tasks was causing problems, it turned out that there was a bug in the PyMW BOINC interface. Now that that&#8217;s fixed, I ran with 200 nodes yesterday and 800 today.</p>
<p>This morning, I tried running with 2 physical nodes: my laptop and my Ubuntu VM, which failed at the end of computation. Somehow the canonical results are not being recognized which causes the BOINC interface to get lost in limbo and hang forever.</p>
<p>This week, I created a pure-Python assimilator for PyMW, which works pretty well, but is perhaps causing the error above.</p>
<p>I also rewrote a big swath of the BOINC interface to stop it from using a new thread for each task during task reclamation (getting data back from BOINC). Since it was using one thread per task, it was reaching the maximum number of scheduler threads. This in turn caused the execution thread to hang until some of the tasks completed. Now it reclaims tasks in a single thread and is able to queue all tasks in a single shot, greatly improving the throughput from PyMW -&gt; BOINC.</p>
<p>Overall, it&#8217;s been really fun so far. The first few days were trying, since there was little documentation of how to get PyMW to play nice with BOINC. But seeing the first application run was great <img src='http://visualcore.com/wp/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://visualcore.com/index.php/2009/06/pymw-week-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Summer of Code!</title>
		<link>http://visualcore.com/index.php/2009/04/google-summer-of-code/</link>
		<comments>http://visualcore.com/index.php/2009/04/google-summer-of-code/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 07:33:04 +0000</pubDate>
		<dc:creator>Jeremy Cowles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BOINC]]></category>
		<category><![CDATA[GSoC]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Parallel]]></category>
		<category><![CDATA[PyMW]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://visualcore.com/wp/?p=38</guid>
		<description><![CDATA[
My proposal has been accepted by the Python Software Foundation for Google Summer of Code 2009!
I will be working on the Python Master-Worker computing project (PyMW), a Python API that provides access to various distributed and parallel computing frameworks. In particular, I will be working on the Berkeley Open Infrastructure for Network Computing (BOINC) integration. [...]]]></description>
			<content:encoded><![CDATA[<p><img style="border: none" src="/images/articles/gsoc2009-pymw.png" alt="PyMW: Python + BOINC" /></p>
<p>My proposal has been accepted by the Python Software Foundation for <a href="http://socghop.appspot.com/org/home/google/gsoc2009/python" target="_BLANK">Google Summer of Code 2009</a>!</p>
<p>I will be working on the <a href="http://pymw.sourceforge.net/" target="_BLANK">Python Master-Worker</a> computing project (PyMW), a Python API that provides access to various distributed and parallel computing frameworks. In particular, I will be working on the Berkeley Open Infrastructure for Network Computing (<a href="http://boinc.berkeley.edu/" target="_BLANK">BOINC</a>) integration. BOINC is best known as the underlying system that enabled SETI@Home and I&#8217;m really excited to peek under the hood!</p>
<p>My goal is to make it easier for PyMW users to create BOINC applications by simplifying the setup process as well as adding new support for some important BOINC features. This will require some changes to PyMW as well as some changes to the BOINC server. If you are interested in seeing all the details, a public copy of <a title="GSoC Proposal" href="http://socghop.appspot.com/document/show/user/jwc/pymwprop" target="_BLANK">my proposal</a> is posted at the GSoC website.</p>
]]></content:encoded>
			<wfw:commentRss>http://visualcore.com/index.php/2009/04/google-summer-of-code/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sobel Edge Detector in VB.NET</title>
		<link>http://visualcore.com/index.php/2008/03/sobel-edge-detector-in-vb-net/</link>
		<comments>http://visualcore.com/index.php/2008/03/sobel-edge-detector-in-vb-net/#comments</comments>
		<pubDate>Sun, 23 Mar 2008 01:52:51 +0000</pubDate>
		<dc:creator>Jeremy Cowles</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Parallel]]></category>
		<category><![CDATA[VB.NET]]></category>
		<category><![CDATA[Vision]]></category>

		<guid isPermaLink="false">http://visualcore.com/wp/?p=96</guid>
		<description><![CDATA[
I recently stumbled onto a C tutorial on edge detection and decided to implement the algorithm in VB.NET. Edge detection is a machine vision technique that attempts to identify interesting parts of an image, such as where one object ends and another begins. One way of finding these areas is to search for sharp changes [...]]]></description>
			<content:encoded><![CDATA[<p><a href="/images/articles/sobel/ss-1-big.jpg" target="_blank"><img src="/images/articles/sobel/ss-1.jpg" alt="Sample Output" style="border: none" /></a><br />
I recently stumbled onto a C tutorial on edge detection and decided to implement the algorithm in VB.NET. Edge detection is a machine vision technique that attempts to identify interesting parts of an image, such as where one object ends and another begins. One way of finding these areas is to search for sharp changes in intensity between each pixel and its neighbors. A source image is processed and an output image is created that highlights the edges found in the source. The output image is a visualization of what the algorithm detected and ultimately these results can be applied to solve a specific problem.</p>
<p>Although the background and mathematical basis for the algorithm are interesting, I’m only going to discuss them briefly and focus on the implementation and performance issues in .NET. If you are curious about the details behind the algorithm, you should check out the original <a href="http://del.icio.us/moldymagnet/edge" target="_blank">articles</a> I found.</p>
<h4>The Algorithm</h4>
<p>To find relative changes in intensity level, the algorithm processes the image one pixel at a time. It looks at the change in intensity to the left and right of the current pixel, stores it and then checks the change in intensity in the pixels above and below the current pixel and stores it as well. The actual process happens one dimension at a time for each pixel (horizontal and then vertical), and then combines the result into two dimensional pixel data, the output image. </p>
<p>As each pixel is encountered, its neighboring pixel’s intensity levels get calculated and then subtracted from the neighbor pixel on the opposite side. The resulting sum of all the pixels is the relative change in intensity at that location. If opposite neighbor pixels are the same color, when subtracted from each other the result will be zero (black).  If the two neighboring pixels were radically different intensity levels the output would be greater than zero. The mechanism that averages the pixels is a weighted matrix, one for vertical (the xMask) and one for horizontal (the yMask):<br />
<code><br />
Dim xMask(,) As Single _<br />
    = New Single(,) {{-1, 0, 1}, _<br />
                     {-2, 0, 2}, _<br />
                     {-1, 0, 1}}</p>
<p>Dim yMask(,) As Single _<br />
    = New Single(,) {{1, 2, 1}, _<br />
                     {0, 0, 0}, _<br />
                     {-1, -2, -1}}<br />
</code><br />
Each element in the matrices represents a bordering pixel, with the center element being the current pixel. The numbers in each matrix is the weighting of the importance of the pixel at that location, so pixels directly above or beside the current pixel are weighted heavier (denoted above with a 2 instead of a 1) than diagonal pixels. Notice that in both matrices, the current pixel is ignored (set to zero). </p>
<p>For the vertical yMask, the strictly horizontal elements are zero, and the opposite is true for the xMask. The elements of each matrix are multiplied by the border pixels intensity levels, and then the results are summed. Since the opposite sides are also opposite signs, the sum is the change in intensity we were looking for. The final step is to add the absolute value of the horizontal and vertical differences in intensity. This process is actually a rough approximation of the mathematical gradient of the image.</p>
<h4>First Implementation: GDI+</h4>
<p>To get things started, I wanted to keep the details of working with the image data to a minimum, so I created the algorithm to work with the infinitely slow GDI+ Bitmap object using the GetPixel and SetPixel methods. The implementation is straight forward:<br />
<code><br />
 1. Create X and Y loops to scan across<br />
     each pixel in the source image<br />
 2. Create I and J loops to process the<br />
     eight border pixels for the current pixel (X,Y)<br />
 3. Get the current border pixel<br />
     Intensity(X + I, Y + J) : 1/3 * (R + G + B)<br />
     using Bitmap.GetPixel on the source image<br />
 4. Multiply the intensity by the appropriate mask<br />
 5. Clamp the output value to [0, 255]<br />
 6. Write the output pixel value to the output<br />
     image (Bitmap.SetPixel)<br />
</code><br />
Below is sample output of the initial implementation:<br />
<a href="/images/articles/sobel/ss-1-big.jpg" target="_blank"><img src="/images/articles/sobel/ss-1.jpg" alt="Sample Output" style="border: none" /></a><br />
This process typically runs in about 3K pixels /second, which sounds fast… but actually takes about 160 seconds to process an 800 x 600 pixel image (480K pixels). So for real-time processing, this method is absolutely out. Although it’s very slow, this method works as expected and was useful to me as a reference renderer as I tested new approaches.</p>
<h4>Take Two: Direct Pixel Access</h4>
<p>The GDI Bitmap object offers a handy function, Lock/UnlockBits(), that returns the raw bytes of memory composing the Bitmap object. By using this function, it is possible to read all pixels in one call, process them, and then write them back in a single call. This is much, much faster than using Get/SetPixel() methods which operate on a single pixel for each read and write. The intense down side of LockBits is that you no longer have friendly access to the pixel by X and Y coordinates, and documentation is pretty bad.<br />
When calling LockBits(), you specify what format you want the data to be returned in. The pixels get returned as a contiguous array of bit data, and you are charged with picking it apart.  For my purposes, I’ve forced the format to always be 24 bit RGB. The function returns a one dimensional array of bytes. Each pixel is encoded according to the format specified, so in my case, there are 3 bytes for each pixel (R, G and B). To emulate two dimensions, a “stride” value is given, which lets you know how many bytes there are per line, along with a “height” which is the total number of scan lines. So each pixel can still be accessed with X and Y coordinates by using the following formula:<br />
<code><br />
Pixel.Red = Array[stride * Y + X * 3]<br />
Pixel.Blue = Array[stride * Y + X * 3 + 1]<br />
Pixel.Green = Array[stride * Y + X * 3 + 2]<br />
</code><br />
Notice that the Y value is multiplied by the width of the scan line and X is multiplied by the amount of byte data per pixel, the 3 here is for RGB. The first byte at this location is red, the next byte is green and the last byte is blue, which is why 0, 1 and 2 are added to X.<br />
To make this logic less painful, I created a wrapper class for the bitmap object which has its own GetPixel and SetPixel methods. This class locks the bits on the image when it loads and then operates on the array. It implements IDisposable, and when Dispose is called, it calls UnlockBits on the original bitmap image, committing all changes to the image at once.<br />
This implementation runs at around 50Kp/s, a huge improvement of over the original. Processing the pixels in blocks greatly improved performance, but this is still a little too slow for any real-time application. The 800&#215;600 image still takes about 10 seconds to process with this new method.</p>
<h4>Take Three: Divide and Conquer</h4>
<p>The next approach I took was to reduce the input data by splitting the image in two. The actual split is done by creating Rectangle objects and then passing these rectangles to LockBits when retrieving the image data.<br />
Since I wanted to test different numbers of splits, it became very important to *neatly* keep track of work units, that is, what part of the image was actually being processed. Also, I had another idea in mind for the next implementation, so I wanted this idea of work units to be reusable. To facilitate this, I created a class called ImageWorkUnit with the following properties:<br />
<code><br />
Image:    The input Bitmap which is being processed<br />
WorkArea: The Rectangle of the area to process<br />
Result:   The output Bitmap shared by all workers<br />
</code><br />
Using this new work unit class, I then created a method to split the image into multiple work units (the number of units is variable) and then a loop to process each loop sequentially. Right now, stop and make a quick estimate of how fast this new method will run using two units (splitting the image in half). Before I ran this new implementation, I made an approximation of no more than few percent speed increase. I was wrong. This method runs at a blazing 450Kp/s, a vast improvement over the direct pixel access method.<br />
As happy as I was with the result, it was a little disturbing and I decide to do some tests. I varied the number of divisions, and much to my surprise, using only one division it ran at the same speed, 450Kp/s. This implied that the speed increase was not caused by dividing the image, but by something else. Here is the original processing loop:<br />
<code><br />
For y As Integer = 0 To inImg.Height - 1<br />
    For x As Integer = 0 To inImg.Width – 1<br />
    ...<br />
    Next<br />
Next<br />
</code><br />
Now here is the new processing loop for the work unit based implementation:<br />
<code><br />
For y As Integer = 0 To area.Height - 1<br />
    For x As Integer = 0 To area.Width - 1<br />
    ...<br />
    Next<br />
Next<br />
</code><br />
The speed increase was due to the fact that I was no longer accessing the Height and Width properties of the image (it was also accessed in the body of the loop). It turns out, the Height and Width properties of the Bitmap object are not exactly optimized. By simply not accessing these properties inside the loop, I got a speed up of about 900%.</p>
<h4>Take Four: Use the Cores Luke</h4>
<p>In the final implementation, I created a new class to spawn separate threads to process the image in parallel. This function gets the processor count, and then splits the image that many times and spawns threads to process the chunks.<br />
Oddly, this method runs at either 500Kp/s *or* around 700Kp/s. The discrepancy is because of the thread scheduler. In the new class I created, I simply spawn threads, fire them off and leave it up to the thread scheduler to pick a core to execute on. If both threads execute on the same core, the result is 500Kp/s, when they happen to run on different cores, the speed up to 700Kp/s occurs. I’m not exactly sure how to get around this in managed code – if you have any ideas, please post a comment and let me know.<br />
So with the fastest algorithm, and my fingers crossed, it can process that 800&#215;600 image in about 0.6 seconds (down from 2.5 minutes), which is actually a reasonable rate for real time applications.<br />
<a href="/downloads/SobelEdgeDetection.zip">Download</a> the Sobel edge detector code and sample images.</p>
]]></content:encoded>
			<wfw:commentRss>http://visualcore.com/index.php/2008/03/sobel-edge-detector-in-vb-net/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

