Cross entropy loss python

CrossEntropyLoss¶

This criterion computes the cross entropy loss between input logits and target.

It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

The input is expected to contain the unnormalized logits for each class (which do not need to be positive or sum to 1, in general). input has to be a Tensor of size ( C ) (C) ( C ) for unbatched input, ( m i n i b a t c h , C ) (minibatch, C) ( miniba t c h , C ) or ( m i n i b a t c h , C , d 1 , d 2 , . . . , d K ) (minibatch, C, d_1, d_2, . d_K) ( miniba t c h , C , d 1 , d 2 , . , d K ) with K ≥ 1 K \geq 1 K ≥ 1 for the K -dimensional case. The last being useful for higher dimension inputs, such as computing cross entropy loss per-pixel for 2D images.

The target that this criterion expects should contain either:

ℓ ( x , y ) = L = < l 1 , … , l N >⊤ , l n = − w y n log ⁡ exp ⁡ ( x n , y n ) ∑ c = 1 C exp ⁡ ( x n , c ) ⋅ 1 < y n ≠ ignore_index >\ell(x, y) = L = \^\top, \quad l_n = — w_ \log \frac<\exp(x_)>^C \exp(x_)> \cdot \mathbb\\> ℓ ( x , y ) = L = < l 1 , … , l N >⊤ , l n = − w y n lo g ∑ c = 1 C exp ( x n , c ) exp ( x n , y n ) ⋅ 1

where x x x is the input, y y y is the target, w w w is the weight, C C C is the number of classes, and N N N spans the minibatch dimension as well as d 1 , . . . , d k d_1, . d_k d 1 , . , d k for the K -dimensional case. If reduction is not ‘none’ (default ‘mean’ ), then

ℓ ( x , y ) = L = < l 1 , … , l N >⊤ , l n = − ∑ c = 1 C w c log ⁡ exp ⁡ ( x n , c ) ∑ i = 1 C exp ⁡ ( x n , i ) y n , c \ell(x, y) = L = \^\top, \quad l_n = — \sum_^C w_c \log \frac<\exp(x_)>^C \exp(x_)> y_ ℓ ( x , y ) = L = < l 1 , … , l N >⊤ , l n = − c = 1 ∑ C w c lo g ∑ i = 1 C exp ( x n , i ) exp ( x n , c ) y n , c

The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilities only when a single class label per minibatch item is too restrictive.

weight (Tensor,optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C
size_average (bool,optional) – Deprecated (see reduction ). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False , the losses are instead summed for each minibatch. Ignored when reduce is False . Default: True
ignore_index (int,optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When size_average is True , the loss is averaged over non-ignored targets. Note that ignore_index is only applicable when the target contains class indices.
reduce (bool,optional) – Deprecated (see reduction ). By default, the losses are averaged or summed over observations for each minibatch depending on size_average . When reduce is False , returns a loss per batch element instead and ignores size_average . Default: True
reduction (str,optional) – Specifies the reduction to apply to the output: ‘none’ | ‘mean’ | ‘sum’ . ‘none’ : no reduction will be applied, ‘mean’ : the weighted mean of the output is taken, ‘sum’ : the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction . Default: ‘mean’
label_smoothing (float,optional) – A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default: 0.0 0.0 0.0 .

Input: Shape ( C ) (C) ( C ) , ( N , C ) (N, C) ( N , C ) or ( N , C , d 1 , d 2 , . . . , d K ) (N, C, d_1, d_2, . d_K) ( N , C , d 1 , d 2 , . , d K ) with K ≥ 1 K \geq 1 K ≥ 1 in the case of K -dimensional loss.
Target: If containing class indices, shape ( ) () ( ) , ( N ) (N) ( N ) or ( N , d 1 , d 2 , . . . , d K ) (N, d_1, d_2, . d_K) ( N , d 1 , d 2 , . , d K ) with K ≥ 1 K \geq 1 K ≥ 1 in the case of K-dimensional loss where each value should be between [ 0 , C ) [0, C) [ 0 , C ) . If containing class probabilities, same shape as the input and each value should be between [ 0 , 1 ] [0, 1] [ 0 , 1 ] .
Output: If reduction is ‘none’, shape ( ) () ( ) , ( N ) (N) ( N ) or ( N , d 1 , d 2 , . . . , d K ) (N, d_1, d_2, . d_K) ( N , d 1 , d 2 , . , d K ) with K ≥ 1 K \geq 1 K ≥ 1 in the case of K-dimensional loss, depending on the shape of the input. Otherwise, scalar.

C = number of classes N = batch size \begin C =<> & \text \\ N =<> & \text \\ \end C = N = number of classes batch size

>>> # Example of target with class indices >>> loss = nn.CrossEntropyLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.empty(3, dtype=torch.long).random_(5) >>> output = loss(input, target) >>> output.backward() >>> >>> # Example of target with class probabilities >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5).softmax(dim=1) >>> output = loss(input, target) >>> output.backward()

Источник

sklearn.metrics .log_loss¶

This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of a logistic model that returns y_pred probabilities for its training data y_true . The log loss is only defined for two or more labels. For a single sample with true label \(y \in \\) and a probability estimate \(p = \operatorname(y = 1)\) , the log loss is:

Parameters : y_true array-like or label indicator matrix

Ground truth (correct) labels for n_samples samples.

y_pred array-like of float, shape = (n_samples, n_classes) or (n_samples,)

Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer .

eps float or “auto”, default=”auto”

Log loss is undefined for p=0 or p=1, so probabilities are clipped to max(eps, min(1 — eps, p)) . The default will depend on the data type of y_pred and is set to np.finfo(y_pred.dtype).eps .

Changed in version 1.2: The default value changed from 1e-15 to «auto» that is equivalent to np.finfo(y_pred.dtype).eps .

Deprecated since version 1.3: eps is deprecated in 1.3 and will be removed in 1.5.

If true, return the mean loss per sample. Otherwise, return the sum of the per-sample losses.

sample_weight array-like of shape (n_samples,), default=None

labels array-like, default=None

If not provided, labels will be inferred from y_true. If labels is None and y_pred has shape (n_samples,) the labels are assumed to be binary and are inferred from y_true .

Log loss, aka logistic loss or cross-entropy loss.

The logarithm used is the natural logarithm (base-e).

C.M. Bishop (2006). Pattern Recognition and Machine Learning. Springer, p. 209.

>>> from sklearn.metrics import log_loss >>> log_loss(["spam", "ham", "ham", "spam"], . [[.1, .9], [.9, .1], [.8, .2], [.35, .65]]) 0.21616.

Источник