- Softmax Activation Function with Python
- Mathematical representation of softmax in Python
- How softmax formula works
- It’s time to get started!
- Related articles
- Combining Machine Learning With Blockchain and Its Outcome
- 3 Steps to Help You Build a Powerful Web Developer Portfolio
- How to Choose an Activation Function For Deep Learning
- Everything You Need To Know About The AWS Cloud Practitioner Exam In 2023
- Softmax: Multiclass Neural Networks
- Calculating Softmax in Python
- What is the Softmax function?
- Implementing Softmax function in Python
- Using frameworks to calculate softmax
- 1. Tensorflow
- 2. Scipy
- 3. PyTorch
- Conclusion
Softmax Activation Function with Python
The softmax activation function is one of the most popular terms we come across while resolving problems related to machine learning, or, more specifically, deep learning. We place softmax activation function at the end of a neural network in the deep learning model. Why? Because it normalizes the network output to a probability distribution over the estimated output classes.
We leverage this as an activation function to resolve multiclass classification problems. It can also be perceived as a generalization of the sigmoid function since it is used to present a probability distribution for a binary variable.
Note: If we look for softmax in the output layer in the Keras deep learning library with three-class classification activity, it might look like the example given below.
model.add(Dense(3, activation=’softmax’))
In order to implement it in Python, we will help you to briefly learn it from scratch. Let’s start with the mathematical expression for the same in Python.
Mathematical representation of softmax in Python
The softmax function scales logits/numbers into probabilities. The output of this function is a vector that offers probability for each probable outcome. It is represented mathematically as:
— Z = It is the input vector of the softmax activation function. It comprises n elements for n classes.
— Z(i) = It is i-th element of the input vector that takes up any value between negative infinity to positive infinity.
— exp [Z(i)] = It is the standard exponential function applied to Z(i). The resulting values may be tiny but they are never zero.
— n= It represents the number of classes
— Σexp [Z(i)] = It is a normalization term that makes sure the output vector value sums up to 1 for the i-th class.
Here’s an example to understand how it works as per this formula.
How softmax formula works
It works for a batch of inputs with a 2D array where n rows = n samples and n columns = n nodes. It can be implemented with the following code.
import numpy as np def Softmax(x): ''' Performs the softmax activation on a given set of inputs Input: x (N,k) ndarray (N: no. of samples, k: no. of nodes) Returns: Note: Works for 2D arrays only(rows for samples, columns for nodes/outputs) ''' max_x = np.amax(x, 1).reshape(x.shape[0],1) # Get the row-wise maximum e_x = np.exp(x - max_x ) # For stability return e_x / e_x.sum(axis=1, keepdims=True)
This is the simplest implementation of softmax in Python. Another way is the Jacobian technique. An example code is given below.
import numpy as np def Softmax_grad(x): # Best implementation (VERY FAST) '''Returns the Jacobian of the softmax function for the given set of inputs. Inputs: x: should be a 2d array where the rows correspond to the samples and the columns correspond to the nodes. Returns: jacobian ''' s = Softmax(x) a = np.eye(s.shape[-1]) temp1 = np.zeros((s.shape[0], s.shape[1], s.shape[1]),dtype=np.float32) temp2 = np.zeros((s.shape[0], s.shape[1], s.shape[1]),dtype=np.float32) temp1 = np.einsum('ij,jk->ijk',s,a) temp2 = np.einsum('ij,ik->ijk',s,s) return temp1-temp2
The above-mentioned Python code implementations are only pitched and tested for a batch of inputs. This means that the expected input is a 2D array with rows. It presents different samples and columns defining different nodes.
If you wish to speed up these implementations, use Numba which is best known for translating the subset of Python and NumPy code into fast machine code. To install it, use:
Note: Make sure that your NumPy is compatible with Numba, though pip takes care of the same most of the time.
Here’s how you can implement softmax NUMBA.
from numba import njit @njit(cache=True,fastmath=True) # Best implementation (VERY FAST) def Softmax(x): ''' Performs the softmax activation on a given set of inputs Input: x (N,k) ndarray (N: no. of samples, k: no. of nodes) Returns: Note: Works for 2D arrays only (rows for samples, columns for nodes/outputs) ''' max_x = np.zeros((x.shape[0],1),dtype=x.dtype) for i in range(x.shape[0]): max_x[i,0] = np.max(x[i,:]) e_x = np.exp(x - max_x) return e_x / e_x.sum(axis=1).reshape((-1, 1)) # Alternative of keepdims=True for Numba compatibility
Here’s the code for the softmax derivative (Jacobian) NUMBA implementation.
from numba import njit @njit(cache=True,fastmath=True) def Softmax_grad(x): # Best implementation (VERY FAST) '''Returns the Jacobian of the softmax function for the given set of inputs. Inputs: x: should be a 2d array where the rows correspond to the samples and the columns correspond to the nodes. Returns: jacobian ''' s = Softmax(x) a = np.eye(s.shape[-1]) temp1 = np.zeros((s.shape[0], s.shape[1], s.shape[1]),dtype=np.float32) temp2 = np.zeros((s.shape[0], s.shape[1], s.shape[1]),dtype=np.float32) # Einsum is unsupported with Numba (nopython mode) # temp1 = np.einsum('ij,jk->ijk',s,a) # temp2 = np.einsum('ij,ik->ijk',s,s) for i in range(s.shape[0]): for j in range(s.shape[1]): for k in range(s.shape[1]): temp1[i,j,k] = s[i,j]*a[j,k] temp2[i,j,k] = s[i,j]*s[i,k] return temp1-temp2
These methods are quite fast with TensorFlow and PyTorch. However, there is another method that can be used to accelerate the implementation of softmax in Python. It is with the help of Cupy (CUDA) which is an open-source array library that is used for GPU-accelerated computing with Python.
Here’s what the Cupy implementation looks like:
import cupy as cp def Softmax_cupy(x): ''' Performs the softmax activation on a given set of inputs Input: x (N,k) ndarray (N: no. of samples, k: no. of nodes) Returns: Note: Works for 2D arrays only (rows for samples, columns for nodes/outputs) ''' max_x = cp.amax(x, 1).reshape(x.shape[0],1) e_x = cp.exp(x - max_x) # For stability as it is prone to overflow and underflow # return e_x / e_x.sum(axis=1, keepdims=True) # Alternative 1 return e_x / e_x.sum(axis=1).reshape((-1, 1)) # Alternative 2 def Softmax_grad_cupy(x): # Best implementation (VERY FAST) '''Returns the Jacobian of the softmax function for the given set of inputs. Inputs: x: should be a 2d array where the rows correspond to the samples and the columns correspond to the nodes. Returns: jacobian ''' s = Softmax_cupy(x) a = cp.eye(s.shape[-1]) temp1 = cp.zeros((s.shape[0], s.shape[1], s.shape[1]),dtype=cp.float32) temp2 = cp.zeros((s.shape[0], s.shape[1], s.shape[1]),dtype=cp.float32) temp1 = cp.einsum('ij,jk->ijk',s,a) temp2 = cp.einsum('ij,ik->ijk',s,s) return temp1-temp2
Using these different methods, you can efficiently implement the softmax activation function in Python.
Assume a neural network that classifies an input image, whether it is of a dog, cat, tiger, or none.
Consider the feature vector to be X = [x1, x2, x3, x4]). In a neural network, the layout takes place as follows.
Image source
For the above-given neural network, the matrix will be:
Where the weight matrix is:
m = Total number of nodes in layer L-1
n = Total number of nodes in output layer L.
For the above-given example, we have m=3 and n=4.
the following are the calculated values at layer L-1.
If we calculate the exponential values of each element in the Z [L] matrix, we will have to calculate values for:
Now, let’s consider the numeric values of each expression we listed above to drive output. Suppose we input the following values for the Z [L] matrix:
With this, the exponential values will be:
Thus, if we calculate the probability distribution as per the following formula,
The probability distribution for the above example will result as:
It is clear from the above values that the input image was that of a dog since it has a greater probability.
Implementing code for softmax function
If we have to implement a code for the above softmax function example in Python, here’s how it will be done.
Output of the code:
- Probability (Cat) = 0.19101770813831334
- Probability (Dog) = 0.627890718698843
- Probability (Tiger) = 0.11938650086860875
- Probability (None) = 0.061705072294234845
It clearly depicts that the input image is a dog as its output depicts the highest probability. This is the result we derived when we manually calculated the probability using the mathematical formula of the softmax activation function.
It’s time to get started!
This article sums up everything to get you started with implementing the softmax activation function in Python. From the multiple methods to speeding up the implementation using practical mathematical expressions and how they work behind the code — you have a clear understanding of all the resulting layers.
Related articles
Combining Machine Learning With Blockchain and Its Outcome
This article will examine these questions and the possible ways they can be combined. Note that a basic understanding of machine learning and blockchain.
3 Steps to Help You Build a Powerful Web Developer Portfolio
A web developer portfolio is like a 21st century modernized resume. It mainly consists of your best built projects, a summary of your skill set, and your achievements.
How to Choose an Activation Function For Deep Learning
The most vital part of a neural network is its activation functions. Based on them, a neural network decides if a node will.
Everything You Need To Know About The AWS Cloud Practitioner Exam In 2023
Do you have a basic understanding of the AWS platform? Are you thinking of going for the AWS cloud practitioner exam? The AWS certification.
Softmax: Multiclass Neural Networks
Softmax activation function or normalized exponential function is a generalization of the logistic function that turns a vector of K.
Calculating Softmax in Python
Hello learners!! In this tutorial, we will learn about the Softmax function and how to calculate the softmax function in Python using NumPy. We will also get to know frameworks that have built-in methods for Softmax. So let’s get started.
What is the Softmax function?
Softmax is a mathematical function that takes as input a vector of numbers and normalizes it to a probability distribution, where the probability for each value is proportional to the relative scale of each value in the vector.
Before applying the softmax function over a vector, the elements of the vector can be in the range of (-∞, ∞) .
Some elements can be negative while some can be positive.
After applying the softmax function, each value will be in the range of [0, 1] , and the values will sum up to 1 so that they can be interpreted as probabilities.
The formula for softmax calculation is
)_=\frac
where we first find the exponential of each element in the vector and divide them by the sum of exponentials calculated.
Softmax function is most commonly used as an activation function for Multi-class classification problem where you have a range of values and you need to find probability of their occurance. The softmax function is used in the output layer of neural network models that predict a multinomial probability distribution.
Implementing Softmax function in Python
Now we know the formula for calculating softmax over a vector of numbers, let’s implement it. We will use NumPy exp() method for calculating the exponential of our vector and NumPy sum() method to calculate our denominator sum.
import numpy as np def softmax(vec): exponential = np.exp(vec) probabilities = exponential / np.sum(exponential) return probabilities vector = np.array([1.0, 3.0, 2.0]) probabilities = softmax(vector) print("Probability Distribution is:") print(probabilities)
Probability Distribution is: [0.09003057 0.66524096 0.24472847]
Using frameworks to calculate softmax
Many frameworks provide methods to calculate softmax over a vector to be used in various mathematical models.
1. Tensorflow
You can use tensorflow.nn.softmax to calculate softmax over a vector as shown.
import tensorflow as tf import numpy as np vector = np.array([5.5, -13.2, 0.5]) probabilities = tf.nn.softmax(vector).numpy() print("Probability Distribution is:") print(probabilities)
Probability Distribution is: [9.93307142e-01 7.51236614e-09 6.69285087e-03]
2. Scipy
Scipy library can be used to calculate softmax using scipy.special.softmax as shown below.
import scipy import numpy as np vector = np.array([1.5, -3.5, 2.0]) probabilities = scipy.special.softmax(vector) print("Probability Distribution is:") print(probabilities)
Probability Distribution is: [0.3765827 0.00253739 0.62087991]
3. PyTorch
You can use Pytorch torch.nn.Softmax(dim) to calculate softmax, specifying the dimension over which you want to calculate it as shown.
import torch vector = torch.tensor([1.5, -3.5, 2.0]) probabilities = torch.nn.Softmax(dim=-1)(vector) print("Probability Distribution is:") print(probabilities)
Probability Distribution is: tensor([0.3766, 0.0025, 0.6209])
Conclusion
Congratulations. Now you have learned about softmax function and how to implement it using various ways, you can use it in for your multi-class classification problems in Machine Learning.