Parallel Processing in Python
Parallel processing can increase the number of tasks done by your program which reduces the overall processing time. These help to handle large scale problems.
In this section we will cover the following topics:
- Introduction to parallel processing
- Multi Processing Python library for parallel processing
- IPython parallel framework
Introduction to parallel processing
For parallelism, it is important to divide the problem into sub-units that do not depend on other sub-units (or less dependent). A problem where the sub-units are totally independent of other sub-units is called embarrassingly parallel.
For example, An element-wise operation on an array. In this case, the operation needs to aware of the particular element it is handling at the moment.
In another scenario, a problem which is divided into sub-units have to share some data to perform operations. These results in the performance issue because of the communication cost.
There are two main ways to handle parallel programs:
- Shared Memory In shared memory, the sub-units can communicate with each other through the same memory space. The advantage is that you don’t need to handle the communication explicitly because this approach is sufficient to read or write from the shared memory. But the problem arises when multiple process access and change the same memory location at the same time. This conflict can be avoided using synchronization techniques.
Multiprocessing for parallel processing
Using the standard multiprocessing module, we can efficiently parallelize simple tasks by creating child processes. This module provides an easy-to-use interface and contains a set of utilities to handle task submission and synchronization.
Process and Pool Class
Process By subclassing multiprocessing.process, you can create a process that runs independently. By extending the __init__ method you can initialize resource and by implementing Process.run() method you can write the code for the subprocess. In the below code, we see how to create a process which prints the assigned id: To spawn the process, we need to initialize our Process object and invoke Process.start() method. Here Process.start() will create a new process and will invoke the Process.run() method.
The code after p.start() will be executed immediately before the task completion of process p. To wait for the task completion, you can use Process.join() . Here’s the full code: