Thursday, 20 March 2014

Parallel Computing in .NET Framework 4.0 - Part 1

The Concept of Parallel Computing

After release of .NET Framework 4.0, many developers in the world are talking about the Parallel Computing. Before start talking about Parallel Computer, we need to understand, why we need this? – Many personal computers and workstations have two or four cores (that is, CPUs) that enable multiple threads to be executed simultaneously. Computers in the near future are expected to have significantly more cores. To take advantage of the hardware of today and tomorrow, you can parallelize your code to distribute work across multiple processors. 

In the early years of personal computers, machines were built with a single central processing unit (CPU).
Between the 1975's to the mid 1980's, the CPU makes increasing the clock speed to increase the processor power, but the changes used to be minor.
Between the early 1990's and the mid 2000's, the clock speed of the CPU in a personal computer increased from a mere 33 megahertz to around 3.5 gigahertz. This alone represents an increase in performance of over one hundred times. In addition, each new processor model introduced additional efficiency improvements and extra technology to make the speed improvement even greater.
Since 2005, the increase in CPU clock speed has stalled. One of the key reasons is that faster processors produce many times more heat than slower ones. Dissipating this heat to keep the processor operating within a safe temperature range is much more difficult. There are other reasons too, linked to the design of CPUs and the amount of additional power required for higher clock speeds.

The solution that the major CPU designers have selected is to move away from trying to increase clock speed and instead focus on adding more processor cores. Each core acts like a single processor that can do work. If you have two cores in your processor, it can process two independent tasks in parallel without the inefficiency of task-switching. As you increase the number of cores, you also increase the amount of code or data that can be processed in parallel, leading to an overall performance improvement without a change in clock speed.


CPU clock speed for a single CPU has been fairly static in the last couple of years – hovering around 3.4 GHz. Of course, we shouldn’t fall completely into the Megahertz myth, but one avenue of speed increase has been blocked.

In current time it is difficult to find a new computer that has a processor with only one core. Desktop computers commonly have dual-core (2) or quad-core (4) or six core CPUs with technology that gives eight or twelve virtual processors. Notebook computers usually include at least a dual-core processor and often include four cores. Netbooks, which are designed for web browsing and are less powerful that notebooks, often include dual-core CPUs too. Even some mobile phones have more than one core. This trend is likely to continue, with companies such as Intel indicating that future CPUs may include a thousand cores.
How many cores does an Intel Core i7 have? - The Intel i7 has 4 physical cores; all these four cores are then hyper-threaded. By using this Hyper-threading tool your OS will see two virtual cores for each physical core. This allows the workload of a particular task to be shared between the cores more efficiently allowing it to run faster. [Just a basic overview]
What we are doing as a Developer? - Many developers including me trained to think about programming in a sequential manner. If we continue to program in this way our software will not take advantage of the improvements made available by parallel processing. A standard .NET program that does not create new threads will only use a single core. On current hardware this may mean that only a half or a quarter of the available processing power is available to us. In the future, programs like these may only use a tiny fraction of the processor. Similar software that fully utilizes parallel programming will perform better and likely be favored by our users.
Before .NET 4.0, C# developers could obtain the improved performance of newer CPUs by creating multi-threaded software. Often this type of software only creates a few additional threads to speed up a process or to allow the user interface to remain responsive whilst a background task is completed..
With .NET 4.0, Microsoft introduced new tools that are designed to simplify the creation of parallel code. These remove some, but not all, of the complexities of multi-threading. They also allow the same code to run on different computers with varying numbers of cores, taking advantage of all of the available processors.
Moore's law: Moore's law is the observation that over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. The law is named after Intel co-founder Gordon E. Moore, who described the trend in his 1965 paper. [Source - WIKI]
Don’t confuse with 32 bit and 64 bit computing - As the number of bits increases there are two important benefits: 1. more bits means that data can be processed in larger chunks which also means more accurately. 2. More bits mean our system can point to or address a larger number of locations in physical memory. 32-bit systems were once desired because they could address (point to) 4 Gigabytes (GB) of memory in one go. Some modern applications require more than 4 GB of memory to complete their tasks so 64-bit systems are now becoming more attractive because they can potentially address up to 4 billion times that many locations.

How Parallel Programming Working

Parallel programming works with decomposition principle, so it is the process of breaking a program, algorithm or data set into sections that can be processed independently. Many algorithms can be decomposed but others are naturally sequential and do not support parallelism. You may have to replace an algorithm entirely to achieve a result that can be better decomposed; otherwise the sequential parts can remove the benefits of parallelism.
For example, if you have a routine that takes 8 minutes to run and the algorithm supports easy decomposition, allocating 25% of the work to each of four processors could reduce the duration towards two minutes. If 90% of the algorithm must be handled sequentially, the 8 minute task may be divided between four cores but one core would spend 7 minutes working on it and the remaining cores were idle.
There are two types of decomposition, data decomposition and task decomposition:
Data Decomposition: Data decomposition is usually applied to large data processing tasks. It is the process of splitting a large number of data into several smaller groups. Each of those smaller units can then be processed by a separate CPU or core in parallel. At the end of the process, the smaller data units may be recombined into one larger set of results.

Task Decomposition: Task decomposition is generally more complicated than data decomposition and harder to achieve. Instead of looking for large data sets to break up, we look at the algorithms being used and try to split them into smaller tasks that can run in parallel. In some cases algorithms are built from units of code that are tightly dependent upon each other, making it impossible to segregate the smaller tasks. These algorithms must be replaced entirely to gain a parallel processing advantage.

No comments:

Post a Comment