Sparse matrix python index

Содержание

Sparse matrices ( scipy.sparse )#
Contents#
Sparse array classes#
Sparse matrix classes#
Functions#
Submodules#
Exceptions#
Usage information#
Matrix vector product#
Example 1#
Example 2#
Further details#
scipy.sparse.csr_matrix#
Sparse data structures in Python
Dictionary of keys (DOK)
List of list (LIL)
Coordinate list (COO)
Compressed Sparse format
Block Sparse Row matrix (BSR) and DIAgonal storages

Sparse matrices ( scipy.sparse )#

This package is switching to an array interface, compatible with NumPy arrays, from the older matrix interface. We recommend that you use the array objects ( bsr_array , coo_array , etc.) for all new work.

When using the array interface, please note that:

x * y no longer performs matrix multiplication, but element-wise multiplication (just like with NumPy arrays). To make code work with both arrays and matrices, use x @ y for matrix multiplication.
Operations such as sum, that used to produce dense matrices, now produce arrays, whose multiplication behavior differs similarly.
Sparse arrays currently must be two-dimensional. This also means that all slicing operations on these objects must produce two-dimensional results, or they will result in an error. This will be addressed in a future version.

The construction utilities ( eye , kron , random , diags , etc.) have not yet been ported, but their results can be wrapped into arrays:

Contents#

Sparse array classes#

bsr_array (arg1[, shape, dtype, copy, blocksize])

Block Sparse Row format sparse array.

A sparse matrix in COOrdinate format.

Compressed Sparse Column matrix

Compressed Sparse Row matrix

Sparse matrix with DIAgonal storage

Dictionary Of Keys based sparse matrix.

Row-based LIst of Lists sparse matrix

This class provides a base class for all sparse arrays.

Sparse matrix classes#

bsr_matrix (arg1[, shape, dtype, copy, blocksize])

Block Sparse Row format sparse matrix.

A sparse matrix in COOrdinate format.

Compressed Sparse Column matrix

Compressed Sparse Row matrix

Sparse matrix with DIAgonal storage

Dictionary Of Keys based sparse matrix.

Row-based LIst of Lists sparse matrix

This class provides a base class for all sparse matrix classes.

Functions#

Sparse matrix with ones on diagonal

Identity matrix in sparse format

kronecker product of sparse matrices A and B

kronecker sum of sparse matrices A and B

diags (diagonals[, offsets, shape, format, dtype])

Construct a sparse matrix from diagonals.

Return a sparse matrix from diagonals.

Build a block diagonal sparse matrix from provided matrices.

Return the lower triangular portion of a matrix in sparse format

Return the upper triangular portion of a matrix in sparse format

Build a sparse matrix from sparse sub-blocks

Stack sparse matrices horizontally (column wise)

Stack sparse matrices vertically (row wise)

Generate a sparse matrix of the given shape and density with uniformly distributed values.

Generate a sparse matrix of the given shape and density with randomly distributed values.

Save and load sparse matrices:

Save a sparse matrix to a file using .npz format.

Load a sparse matrix from a file using .npz format.

Return the indices and values of the nonzero elements of a matrix

Identifying sparse matrices:

Is x of a sparse array type?

Is x of a sparse matrix type?

Is x of a bsr_matrix type?

Submodules#

Compressed sparse graph routines (scipy.sparse.csgraph)

Sparse linear algebra (scipy.sparse.linalg)

Exceptions#

Usage information#

There are seven available sparse matrix types:

csc_matrix: Compressed Sparse Column format
csr_matrix: Compressed Sparse Row format
bsr_matrix: Block Sparse Row format
lil_matrix: List of Lists format
dok_matrix: Dictionary of Keys format
coo_matrix: COOrdinate format (aka IJV, triplet format)
dia_matrix: DIAgonal format

To construct a matrix efficiently, use either dok_matrix or lil_matrix. The lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy arrays. As illustrated below, the COO format may also be used to efficiently construct matrices. Despite their similarity to NumPy arrays, it is strongly discouraged to use NumPy functions directly on these matrices because NumPy may not properly convert them for computations, leading to unexpected (and incorrect) results. If you do want to apply a NumPy function to these matrices, first check if SciPy has its own implementation for the given sparse matrix class, or convert the sparse matrix to a NumPy array (e.g., using the toarray() method of the class) first before applying the method.

To perform manipulations such as multiplication or inversion, first convert the matrix to either CSC or CSR format. The lil_matrix format is row-based, so conversion to CSR is efficient, whereas conversion to CSC is less so.

All conversions among the CSR, CSC, and COO formats are efficient, linear-time operations.

Matrix vector product#

To do a vector product between a sparse matrix and a vector simply use the matrix dot method, as described in its docstring:

>>> import numpy as np >>> from scipy.sparse import csr_matrix >>> A = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]]) >>> v = np.array([1, 0, -1]) >>> A.dot(v) array([ 1, -3, -1], dtype=int64)

As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore using it will result on unexpected results or errors. The corresponding dense array should be obtained first instead:

>>> np.dot(A.toarray(), v) array([ 1, -3, -1], dtype=int64)

but then all the performance advantages would be lost.

The CSR format is specially suitable for fast matrix vector products.

Example 1#

Construct a 1000×1000 lil_matrix and add some values to it:

>>> from scipy.sparse import lil_matrix >>> from scipy.sparse.linalg import spsolve >>> from numpy.linalg import solve, norm >>> from numpy.random import rand

>>> A = lil_matrix((1000, 1000)) >>> A[0, :100] = rand(100) >>> A[1, 100:200] = A[0, :100] >>> A.setdiag(rand(1000))

Now convert it to CSR format and solve A x = b for x:

>>> A = A.tocsr() >>> b = rand(1000) >>> x = spsolve(A, b)

Convert it to a dense matrix and solve, and check that the result is the same:

Now we can compute norm of the error with:

>>> err = norm(x-x_) >>> err  1e-10 True

Example 2#

Construct a matrix in COO format:

>>> from scipy import sparse >>> from numpy import array >>> I = array([0,3,1,0]) >>> J = array([0,3,1,2]) >>> V = array([4,5,7,9]) >>> A = sparse.coo_matrix((V,(I,J)),shape=(4,4))

Notice that the indices do not need to be sorted.

Duplicate (i,j) entries are summed when converting to CSR or CSC.

>>> I = array([0,0,1,3,1,0,0]) >>> J = array([0,2,1,3,1,0,0]) >>> V = array([1,1,1,1,1,1,1]) >>> B = sparse.coo_matrix((V,(I,J)),shape=(4,4)).tocsr()

This is useful for constructing finite-element stiffness and mass matrices.

Further details#

CSR column indices are not necessarily sorted. Likewise for CSC row indices. Use the .sorted_indices() and .sort_indices() methods when sorted indices are required (e.g., when passing data to other libraries).

Compressed sparse graph routines ( scipy.sparse.csgraph )

Источник

scipy.sparse.csr_matrix#

to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype=’d’.

csr_array((data, (row_ind, col_ind)), [shape=(M, N)])

where data , row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k] .

csr_array((data, indices, indptr), [shape=(M, N)])

is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]] . If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.

Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.

efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
efficient row slicing
fast matrix vector products

slow column slicing operations (consider CSC)
changes to the sparsity structure are expensive (consider LIL or DOK)

Within each row, indices are sorted by column.
There are no duplicate entries.

>>> import numpy as np >>> from scipy.sparse import csr_array >>> csr_array((3, 4), dtype=np.int8).toarray() array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=int8)

>>> row = np.array([0, 0, 1, 2, 2, 2]) >>> col = np.array([0, 2, 2, 0, 1, 2]) >>> data = np.array([1, 2, 3, 4, 5, 6]) >>> csr_array((data, (row, col)), shape=(3, 3)).toarray() array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])

>>> indptr = np.array([0, 2, 3, 6]) >>> indices = np.array([0, 2, 2, 0, 1, 2]) >>> data = np.array([1, 2, 3, 4, 5, 6]) >>> csr_array((data, indices, indptr), shape=(3, 3)).toarray() array([[1, 0, 2], [0, 0, 3], [4, 5, 6]])

Duplicate entries are summed together:

>>> row = np.array([0, 1, 2, 0]) >>> col = np.array([0, 1, 1, 0]) >>> data = np.array([1, 2, 4, 8]) >>> csr_array((data, (row, col)), shape=(3, 3)).toarray() array([[9, 0, 0], [0, 2, 0], [0, 4, 0]])

As an example of how to construct a CSR matrix incrementally, the following snippet builds a term-document matrix from texts:

>>> docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]] >>> indptr = [0] >>> indices = [] >>> data = [] >>> vocabulary = <> >>> for d in docs: . for term in d: . index = vocabulary.setdefault(term, len(vocabulary)) . indices.append(index) . data.append(1) . indptr.append(len(indices)) . >>> csr_array((data, indices, indptr), dtype=int).toarray() array([[2, 1, 0, 0], [0, 1, 1, 1]])

Number of dimensions (this is always 2)

Number of stored values, including explicit zeros.

CSR format data array of the matrix

CSR format index array of the matrix

CSR format index pointer array of the matrix

Determine whether the matrix has sorted indices

Источник

Sparse data structures in Python

Imagine you have a 2-D matrix with hundreds of million elements, where only a few of them contain non-zero values. When storing such a matrix using conventional approach, we would waste a lot of space for zeros.

Sparse data structures allow us to store only non-zero values assuming the rest of them are zeros. This approach saves a lot of memory and computing time. In fact, you can often encounter such matrices when working with NLP or machine learning tasks.

In Python, sparse data structures are implemented in scipy.sparse module, which mostly based on regular numpy arrays.

Let's create a random sparse matrix and compare its size to an identical regular one:

Block Sparse Row matrix (BSR)
Coordinate list matrix (COO)
Compressed Sparse Column matrix (CSC)
Compressed Sparse Row matrix (CSR)
Sparse matrix with DIAgonal storage (DIA)
Dictionary Of Keys based sparse matrix (DOK)
Row-based linked list sparse matrix (LIL)

Dictionary of keys (DOK)

Dictionary of keys ( dok_matrix in scipy) is the easiest way to implement a sparse matrix. As the name suggests, it’s based on a dictionary, in which the keys are tuples representing indices, i.e. tuple(row, column) .

efficient access to individual items (O(1) on average)
fast construction
flexible structure
can be efficiently converted to COO format

very slow iteration in lexicographical order (due to the random order of keys)
slow arithmetics
slow slicing

List of list (LIL)

A row-based format ( lil_matrix in scipy), which uses two numpy arrays with regular Python lists inside them. The rows array stores information about occupied cells, whereas the data array stores corresponding values.

fast incremental construction

slow arithmetics
slow column slicing
memory greedy

Coordinate list (COO)

In scipy, a COO ( coo_matrix ) format uses three arrays, for every non-zero value, there is an entry in all of them.

fast incremental construction
fast item-wise arithmetics
fast conversion to CSR/CSC formats
allows duplicate items

slow access
slow arithmetics

Compressed Sparse format

data is an array which contains all non-zero entries in the row-major order.
indptr points to row starts (i.e., tells where each row begins).
indices is array of column indices (i.e., tells us which cells have non-zero values)

efficient item access and slicing
fast arithmetics
fast vector products

expensive editing

Block Sparse Row matrix (BSR) and DIAgonal storages

The diagonal storage ( dia_matrix is scipy) is used when you need to store diagonal matrices. In scipy, the implementation is not limited to main diagonal only. All diagonals are stored using two arrays, one for data and one for diagonal offsets. The block sparse row format is very similar to CSR, except it stores regular patterns of blocks (squares) which contain mostly non-zero data.

Источник

Читайте также: Java types long int