Using thread in python

Содержание

What is a Thread in Python
What is a Thread
Thread vs Process
Limitations of Threads in Python
When to Use a Thread
Blocking IO
External C Code
Third-Party Python Interpreters
How to Create a New Thread
Example of Running a Function in a Thread

What is a Thread in Python

A thread is the execution of code in a Python process. Each program has one thread by default, but we may need to create new threads to execute tasks concurrently.

In this tutorial you will discover what a thread is in Python.

What is a Thread

A thread refers to a thread of execution in a computer program.

Each program is a process and has at least one thread that executes instructions for that process.

Thread: The operating system object that executes the instructions of a process.

— Page 273, The Art of Concurrency, 2009.

When we run a Python script, it starts an instance of the Python interpreter that runs our code in the main thread. The main thread is the default thread of a Python process.

We may develop our program to perform tasks concurrently, in which case we may need to create and run new threads. These will be concurrent threads of execution without our program, such as:

A Python thread is an object representation of a native thread provided by the underlying operating system.

When we create and run a new thread, Python will make system calls on the underlying operating system and request a new thread be created and to start running the new thread.

This highlights that Python threads are real threads, as opposed to simulated software threads, e.g. fibers or green threads.

The code in new threads may or may not be executed in parallel (at the same time), even though the threads are executed concurrently.

There are a number of reasons for this, such as:

The underlying hardware may or may not support parallel execution (e.g. one vs multiple CPU cores).
The Python interpreter may or may not permit multiple threads to execute in parallel.

This highlights the distinction between code that can run out of order (concurrent) from the capability to execute simultaneously (parallel).

Concurrent: Code that can be executed out of order.
Parallel: Capability to execute code simultaneously.

You can learn more about Python threads in the guide:

Next, let’s consider the important differences between threads and processes.

Run your loops using all CPUs, download my FREE book to learn how.

Thread vs Process

A process refers to a computer program.

Each process is in fact one instance of the Python interpreter that executes Python instructions (Python byte-code), which is a slightly lower level than the code you type into your Python program.

Process: The operating system’s spawned and controlled entity that encapsulates an executing application. A process has two main functions. The first is to act as the resource holder for the application, and the second is to execute the instructions of the application.

— Page 271, The Art of Concurrency, 2009.

The underlying operating system controls how new processes are created. On some systems, that may require spawning a new process, and on others, it may require that the process is forked. The operating-specific method used for creating new processes in Python is not something we need to worry about as it is managed by your installed Python interpreter.

A thread always exists within a process and represents the manner in which instructions or code is executed.

A process will have at least one thread, called the main thread. Any additional threads that we create within the process will belong to that process.

The Python process will terminate once all (non background threads) are terminated.

Process: An instance of the Python interpreter has at least one thread called the MainThread.
Thread: A thread of execution within a Python process, such as the MainThread or a new thread.

Now that we are clear on the differences between processes and threads, let’s take a look at the limitations of threads in Python.

Confused by the threading module API?
Download my FREE PDF cheat sheet

Limitations of Threads in Python

The reference Python interpreter is referred to as CPython.

It is the free version of Python that you can download from python.org to develop and run Python programs.

The CPython Python interpreter generally does not permit more than one thread to run at a time.

This is achieved through a mutual exclusion (mutex) lock within the interpreter that ensures that only one thread at a time can execute Python bytecodes in the Python virtual machine.

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation).

— threading — Thread-based parallelism

This lock is referred to as the Global Interpreter Lock or GIL for short.

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety.

— Global Interpreter Lock, Python Wiki.

This means that although we might write concurrent code with threads in Python and run our code on hardware with many CPU cores, we may not be able to execute our code in parallel.

There are some exceptions to this.

Specifically, the GIL is released by the Python interpreter sometimes to allow other threads to run.

Such as when the thread is blocked, such as performing IO with a socket or file, or often if the thread is executing computationally intensive code in a C library, like hashing bytes.

Luckily, many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

— Global Interpreter Lock, Python Wiki.

Therefore, although in most cases CPython will prevent parallel execution of threads, it is allowed in some circumstances. These circumstances represent the base use case for adopting threads in your Python programs.

Next, let’s look at specific cases when you should consider using threads.

Free Python Threading Course

Download my threading API cheat sheet and as a bonus you will get FREE access to my 7-day email course.

Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores

When to Use a Thread

The reference Python interpreter CPython prevents more than one thread from executing bytecode at the same time.

This is achieved using a mutex called the Global Interpreter Lock or GIL, as we learned in the previous section.

There are times when the lock is released by the interpreter and we can achieve parallel execution of our concurrent code in Python.

Examples of when the lock is released include:

When a thread is performing blocking IO.
When a thread is executing C code and explicitly releases the lock.

There are also ways of avoiding the lock entirely, such has:

Let’s take a look at each of these use cases in turn.

Blocking IO

You should use threads for IO-bound tasks.

An IO-bound task is a type of task that involves reading from or writing to a device, file, or socket connection.

The operations involve input and output (IO), and the speed of these operations is bound by the device, hard drive, or network connection. This is why these tasks are referred to as IO-bound.

CPUs are really fast. Modern CPUs, like a 4GHz CPU, can execute 4 billion instructions per second, and you likely have more than one CPU core in your system.

Doing IO is very slow compared to the speed of CPUs.

Interacting with devices, reading and writing files and socket connections involves calling instructions in your operating system (the kernel), which will wait for the operation to complete. If this operation is the main focus for your CPU, such as executing in the main thread of your Python program, then your CPU is going to wait many milliseconds or even many seconds doing nothing.

That is potentially billions of operations prevented from executing.

A thread performing an IO operation will block for the duration of the operation. While blocked, this signals to the operating system that a thread can be suspended and another thread can execute, called a context switch.

Additionally, the Python interpreter will release the GIL when performing blocking IO operations, allowing other threads within the Python process to execute.

Therefore, blocking IO provides an excellent use case for using threads in Python.

Examples of blocking IO operations include:

Reading or writing a file from the hard drive.
Reading or writing to standard output, input or error (stdin, stdout, stderr).
Printing a document.
Reading or writing bytes on a socket connection with a server.
Downloading or uploading a file.
Querying a server.
Querying a database.
Taking a photo or recording a video.
And so much more.

External C Code

We may make function calls that themselves call down into a third-party C library.

Often these function calls will release the GIL as the C library being called will not interact with the Python interpreter.

This provides an opportunity for other threads in the Python process to run.

For example, when using the “hash” module in the Python standard library, the GIL is released when hashing the data via the hash.update() function.

The Python GIL is released to allow other threads to run while hash updates on data larger than 2047 bytes is taking place when using hash algorithms supplied by OpenSSL.

— hashlib — Secure hashes and message digests

Another example is the NumPy library for managing arrays of data which will release the GIL when performing functions on arrays.

The exceptions are few but important: while a thread is waiting for IO (for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL.

— Write multithreaded or multiprocess code, SciPy Cookbook.

Third-Party Python Interpreters

Another important consideration is the use of third-party Python interpreters.

There are alternate commercial and open source Python interpreters that you can acquire and use to execute your Python code.

Some of these interpreters may implement a GIL and release it more or less than CPython. Other interpreters remove the GIL entirely and allow multiple Python concurrent threads to execute in parallel.

Examples of third-party Python interpreters without a GIL include:

Jython: an open source Python interpreter written in Java.
IronPython: an open source Python interpreter written in .NET.

… Jython does not have the straightjacket of the GIL. This is because all Python threads are mapped to Java threads and use standard Java garbage collection support (the main reason for the GIL in CPython is because of the reference counting GC system). The important ramification here is that you can use threads for compute-intensive tasks that are written in Python.

— No Global Interpreter Lock, Definitive Guide to Jython.

Next, let’s look at how to create new threads in Python.

Overwheled by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

How to Create a New Thread

Sometimes, we may need to create additional threads within our Python process to execute tasks concurrently.

Python provides real native (system-level) threads via the threading.Thread class.

There two main ways to create a new thread, they are:

Create a threading.Thread instance and configure it to run a function.
Extend the threading.Thread class and override the run() function.

Let’s take a closer look at each approach.

Example of Running a Function in a Thread

We can run a custom function in new threads.

This can be achieved by creating an instance of the threading.Thread class and specifying the function to run in the new thread via the target argument.

Источник