- Parallel Nested For-Loops in Python
- Nested For-Loop in Python
- How to Execute a Parallel Nested For-Loop
- Approach 1: One Pool of Workers Per Level
- Approach 2: Shared Pool of Workers Across Levels
- Example of a Nested For-Loop in Python (slow version)
- Parallel for Loop in Python
- Use the multiprocessing Module to Parallelize the for Loop in Python
- Use the joblib Module to Parallelize the for Loop in Python
- Use the asyncio Module to Parallelize the for Loop in Python
- Related Article — Python Loop
Parallel Nested For-Loops in Python
You can convert nested for-loops to execute concurrently or in parallel in Python using thread pools or process pools, depending on the types of tasks that are being executed.
In this tutorial, you will discover how to change a nested for-loop to be concurrent or parallel in Python with a suite of worked examples.
This tutorial was triggered by questions and discussions with Robert L. Thanks again. If you have questions or want to chat through a technical issue in Python concurrency, message me any time.
Nested For-Loop in Python
A nested for-loop is a loop within a loop.
For example, we may need to loop over a number of tasks, and each task has subtasks.
Each task requires effort, e.g. I/O (read or write data) or CPU compute (calculate something), and each subtask also requires some effort.
Importantly, the number and nature of subtasks for each task are a function of the task and may not be known beforehand. The tasks must be computed in order to determine and then issue the subtasks.
Often the tasks are independent of one another, and each subtask is also independent of one another.
Importantly, subtasks are dependent upon tasks. As such, we cannot pre-define a set of function calls prior and issue them all in batch. Instead, we need to navigate the tree or hierarchy of tasks and subtasks.
This raises the question, can we perform the tasks and subtasks concurrently or in parallel?
If so, the concurrent execution of tasks and subtasks can offer a dramatic speed-up.
How can we execute a nested for-loop in parallel in Python?
Run your loops using all CPUs, download my FREE book to learn how.
How to Execute a Parallel Nested For-Loop
A nested for-loop can be converted to run in parallel.
More specifically, we can make it concurrent if the tasks are independent and if the subtasks are independent.
I/O bound tasks like reading and writing from files and sockets can be executed at the same time concurrently using threads. CPU-bound tasks like parsing a document in memory or calculating something can be performed in parallel using process-based concurrency.
You can learn more about the difference between when to use threads vs processes in the tutorial:
Therefore, if we have I/O bound tasks or subtasks, we can use a thread pool to make the loops concurrent via the concurrent.futures.ThreadPoolExecutor class or the multiprocessing.pool.ThreadPool class.
Concurrent for-loops (not nested) are straightforward, for example:
More work is required for concurrent nested for-loops.
If we have CPU-bound tasks or subtasks, we can use a process pool to make loops parallel via the concurrent.futures.ProcessPoolExecutor class or the multiprocessing.Pool class.
Parallel for-loops (not bested) are straightforward, for example:
More work is required for parallel nested for-loops.
There are two main approaches we can use to make a nested for-loop concurrent.
- Create a pool of workers at each level in the hierarchy.
- Share a pool of workers across the hierarchy.
Let’s take a closer look at each approach.
Approach 1: One Pool of Workers Per Level
Each level in a nested for-loop can have its own pool of workers.
That is, each task runs, does its work, creates a pool of workers, and issues the subtasks to the pool. If there is another level of subsubtasks, each of these would create its own pool of workers and issue its own tasks.
This is suited to nested for-loops that have a large number of tasks to execute at a given level.
The downside is the redundancy of having many pools of workers competing with each other. This is not a problem with thread pools, as we may have many thousands of concurrent threads, but process pools are typically limited to one worker per CPU core.
As such, some tuning of the number of workers per pool may be required.
Another downside of this approach is when using process pools, child processes are typically daemonic and are unable to create their own child processes. This means if tasks executing in a child process tries to create their own pool of workers it will fail with an error.
As such, this approach may only be viable when working with thread pools, and even then, perhaps only in a nested loop with tasks and subtasks with many subtasks per task.
Approach 2: Shared Pool of Workers Across Levels
Another approach is to create one pool of workers and issue all tasks, subtasks, and subsubtasks to this pool.
When using thread pools in one process, the pool can be shared with tasks and subtasks as a shared global variable, allowing tasks to be issued directly.
When using process pools, things are more tricky. A centralized pool of workers can be created in a server process using a multiprocessing.Manager and the proxy objects for using the centralized server can be shared among all tasks and subtasks.
An alternate design might be to use a shared queue. All tasks and subtasks may be placed onto the queue and a single consumer of tasks can retrieve items from the queue and issue them to the pool of workers.
This is functionally the same, although it separates the concern of issuing tasks from how they are executed, potentially allowing the consumer to decide to use a thread pool or process pool based on the types of tasks issued to the queue.
Now that we have considered some designs on how to convert a nested for-loop to run concurrently, let’s look at some worked examples.
Confused by the multiprocessing module API?
Download my FREE PDF cheat sheet
Example of a Nested For-Loop in Python (slow version)
Firstly let’s develop a nested for-loop that does not run concurrently.
In this example, we will design a loop with 3 levels.
That is, tasks that generate subtasks, that themselves generate subsubtasks.
Each task will simulate effort with a sleep of one second and report a message.
The complete example of a nested for-loop is listed below.
Parallel for Loop in Python
- Use the multiprocessing Module to Parallelize the for Loop in Python
- Use the joblib Module to Parallelize the for Loop in Python
- Use the asyncio Module to Parallelize the for Loop in Python
Parallelizing the loop means spreading all the processes in parallel using multiple cores. When we have numerous jobs, each computation does not wait for the previous one in parallel processing to complete. Instead, it uses a different processor for completion.
In this article, we will parallelize a for loop in Python.
Use the multiprocessing Module to Parallelize the for Loop in Python
To parallelize the loop, we can use the multiprocessing package in Python as it supports creating a child process by the request of another ongoing process.
The multiprocessing module could be used instead of the for loop to execute operations on every element of the iterable. It’s multiprocessing.pool() object could be used, as using multiple threads in Python would not give better results because of the Global Interpreter Lock.
import multiprocessing def sumall(value): return sum(range(1, value + 1)) pool_obj = multiprocessing.Pool() answer = pool_obj.map(sumall,range(0,5)) print(answer)
Use the joblib Module to Parallelize the for Loop in Python
The joblib module uses multiprocessing to run the multiple CPU cores to perform the parallelizing of for loop. It provides a lightweight pipeline that memorizes the pattern for easy and straightforward parallel computation.
To perform parallel processing, we have to set the number of jobs, and the number of jobs is limited to the number of cores in the CPU or how many are available or idle at the moment.
The delayed() function allows us to tell Python to call a particular mentioned method after some time.
The Parallel() function creates a parallel instance with specified cores (2 in this case).
We need to create a list for the execution of the code. Then the list is passed to parallel, which develops two threads and distributes the task list to them.
from joblib import Parallel, delayed import math def sqrt_func(i, j): time.sleep(1) return math.sqrt(i**j) Parallel(n_jobs=2)(delayed(sqrt_func)(i, j) for i in range(5) for j in range(2))
[1.0, 0.0, 1.0, 1.0, 1.0, 1.4142135623730951, 1.0, 1.7320508075688772, 1.0, 2.0]
Use the asyncio Module to Parallelize the for Loop in Python
The asyncio module is single-threaded and runs the event loop by suspending the coroutine temporarily using yield from or await methods.
The code below will execute in parallel when it is being called without affecting the main function to wait. The loop also runs in parallel with the main function.
import asyncio import time def background(f): def wrapped(*args, **kwargs): return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs) return wrapped @background def your_function(argument): time.sleep(2) print('function finished for '+str(argument)) for i in range(10): your_function(i) print('loop finished')
ended execution for 4 ended execution for 8 ended execution for 0 ended execution for 3 ended execution for 6 ended execution for 2 ended execution for 5 ended execution for 7 ended execution for 9 ended execution for 1