Python open file numpy

Reading and writing files#

This page tackles common applications; for the full collection of I/O routines, see Input and output .

Reading text and CSV files#

With no missing values#

With missing values#

  • return a masked arraymasking out missing values (if usemask=True ), or
  • fill in the missing value with the value specified in filling_values (default is np.nan for float, -1 for int).

With non-whitespace delimiters#

>>> with open("csv.txt", "r") as f: . print(f.read()) 1, 2, 3 4,, 6 7, 8, 9 
Masked-array output#
>>> np.genfromtxt("csv.txt", delimiter=",", usemask=True) masked_array( data=[[1.0, 2.0, 3.0], [4.0, --, 6.0], [7.0, 8.0, 9.0]], mask=[[False, False, False], [False, True, False], [False, False, False]], fill_value=1e+20) 
Array output#
>>> np.genfromtxt("csv.txt", delimiter=",") array([[ 1., 2., 3.], [ 4., nan, 6.], [ 7., 8., 9.]]) 
Array output, specified fill-in value#
>>> np.genfromtxt("csv.txt", delimiter=",", dtype=np.int8, filling_values=99) array([[ 1, 2, 3], [ 4, 99, 6], [ 7, 8, 9]], dtype=int8) 

Whitespace-delimited#

numpy.genfromtxt can also parse whitespace-delimited data files that have missing values if

Читайте также:  Оценка качества кластеризации python

    Each field has a fixed width: Use the width as the delimiter argument.

# File with width=4. The data does not have to be justified (for example, # the 2 in row 1), the last column can be less than width (for example, the 6 # in row 2), and no delimiting character is required (for instance 8888 and 9 # in row 3)

>>> with open("fixedwidth.txt", "r") as f: . data = (f.read()) >>> print(data) 1 2 3 44 6 7 88889 
>>> np.genfromtxt("fixedwidth.txt", delimiter=4) array([[1.000e+00, 2.000e+00, 3.000e+00], [4.400e+01, nan, 6.000e+00], [7.000e+00, 8.888e+03, 9.000e+00]]) 
>>> with open("nan.txt", "r") as f: . print(f.read()) 1 2 3 44 x 6 7 8888 9 
>>> np.genfromtxt("nan.txt", missing_values="x") array([[1.000e+00, 2.000e+00, 3.000e+00], [4.400e+01, nan, 6.000e+00], [7.000e+00, 8.888e+03, 9.000e+00]]) 
>>> with open("skip.txt", "r") as f: . print(f.read()) 1 2 3 44 6 7 888 9 
>>> np.genfromtxt("skip.txt", invalid_raise=False) __main__:1: ConversionWarning: Some errors were detected ! Line #2 (got 2 columns instead of 3) array([[ 1., 2., 3.], [ 7., 888., 9.]]) 
>>> with open("tabs.txt", "r") as f: . data = (f.read()) >>> print(data) 1 2 3 44 6 7 888 9 
>>> np.genfromtxt("tabs.txt", delimiter="\t", missing_values=" +") array([[ 1., 2., 3.], [ 44., nan, 6.], [ 7., 888., 9.]]) 

Read a file in .npy or .npz format#

Write to a file to be read back by NumPy#

Binary#

For security and portability , set allow_pickle=False unless the dtype contains Python objects, which requires pickling.

Masked arrays can’t currently be saved , nor can other arbitrary array subclasses.

Human-readable#

numpy.save and numpy.savez create binary files. To write a human-readable file, use numpy.savetxt . The array can only be 1- or 2-dimensional, and there’s no ` savetxtz` for multiple files.

Читайте также:  Css отцентрировать div внутри div

Large arrays#

Read an arbitrarily formatted binary file (“binary blob”)#

The .wav file header is a 44-byte block preceding data_size bytes of the actual sound data:

chunk_id "RIFF" chunk_size 4-byte unsigned little-endian integer format "WAVE" fmt_id "fmt " fmt_size 4-byte unsigned little-endian integer audio_fmt 2-byte unsigned little-endian integer num_channels 2-byte unsigned little-endian integer sample_rate 4-byte unsigned little-endian integer byte_rate 4-byte unsigned little-endian integer block_align 2-byte unsigned little-endian integer bits_per_sample 2-byte unsigned little-endian integer data_id "data" data_size 4-byte unsigned little-endian integer 

The .wav file header as a NumPy structured dtype:

wav_header_dtype = np.dtype([ ("chunk_id", (bytes, 4)), # flexible-sized scalar type, item size 4 ("chunk_size", "), # little-endian unsigned 32-bit integer ("format", "S4"), # 4-byte string, alternate spelling of (bytes, 4) ("fmt_id", "S4"), ("fmt_size", "), ("audio_fmt", "), # ("num_channels", "), # .. more of the same . ("sample_rate", "), # ("byte_rate", "), ("block_align", "), ("bits_per_sample", "), ("data_id", "S4"), ("data_size", "), # # the sound data itself cannot be represented here: # it does not have a fixed size ]) header = np.fromfile(f, dtype=wave_header_dtype, count=1)[0] 

This .wav example is for illustration; to read a .wav file in real life, use Python’s built-in module wave .

(Adapted from Pauli Virtanen, Advanced NumPy , licensed under CC BY 4.0.)

Write or read large arrays#

Arrays too large to fit in memory can be treated like ordinary in-memory arrays using memory mapping.

array = numpy.memmap("mydata/myarray.arr", mode="r", dtype=np.int16, shape=(1024, 1024)) 
large_array[some_slice] = np.load("path/to/small_array", mmap_mode="r") 

Memory mapping lacks features like data chunking and compression; more full-featured formats and libraries usable with NumPy include:

For tradeoffs among memmap, Zarr, and HDF5, see pythonspeed.com.

Write files for reading by other (non-NumPy) tools#

Formats for exchanging data with other tools include HDF5, Zarr, and NetCDF (see Write or read large arrays ).

Write or read a JSON file#

NumPy arrays are not directly JSON serializable.

Save/restore using a pickle file#

Avoid when possible; pickles are not secure against erroneous or maliciously constructed data.

Use numpy.save and numpy.load . Set allow_pickle=False , unless the array dtype includes Python objects, in which case pickling is required.

Convert from a pandas DataFrame to a NumPy array#

Save/restore using tofile and fromfile #

numpy.ndarray.tofile and numpy.fromfile lose information on endianness and precision and so are unsuitable for anything but scratch storage.

How to write a NumPy how-to

Источник

numpy.load#

Load arrays or pickled objects from .npy , .npz or pickled files.

Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data. Consider passing allow_pickle=False to load data that is known not to contain object arrays for the safer handling of untrusted sources.

The file to read. File-like objects must support the seek() and read() methods and must always be opened in binary mode. Pickled files require that the file-like object support the readline() method as well.

mmap_mode , optional

If not None, then memory-map the file, using the given mode (see numpy.memmap for a detailed description of the modes). A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.

allow_pickle bool, optional

Allow loading pickled object arrays stored in npy files. Reasons for disallowing pickles include security, as loading pickled data can execute arbitrary code. If pickles are disallowed, loading object arrays will fail. Default: False

Changed in version 1.16.3: Made default False in response to CVE-2019-6446.

Only useful when loading Python 2 generated pickled files on Python 3, which includes npy/npz files containing object arrays. If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3.

encoding str, optional

What encoding to use when reading Python 2 strings. Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays. Values other than ‘latin1’, ‘ASCII’, and ‘bytes’ are not allowed, as they can corrupt numerical data. Default: ‘ASCII’

max_header_size int, optional

Maximum allowed size of the header. Large headers may not be safe to load securely and thus require explicitly passing a larger value. See ast.literal_eval for details. This option is ignored when allow_pickle is passed. In that case the file is by definition trusted and the limit is unnecessary.

Returns : result array, tuple, dict, etc.

Data stored in the file. For .npz files, the returned instance of NpzFile class must be closed to avoid leaking file descriptors.

If the input file does not exist or cannot be read.

If allow_pickle=True , but the file cannot be loaded as a pickle.

The file contains an object array, but allow_pickle=False given.

When calling np.load multiple times on the same file handle, if all data has already been read

Create a memory-map to an array stored in a file on disk.

Create or load a memory-mapped .npy file.

  • If the file contains pickle data, then whatever object is stored in the pickle is returned.
  • If the file is a .npy file, then a single array is returned.
  • If the file is a .npz file, then a dictionary-like object is returned, containing key-value pairs, one for each file in the archive.
  • If the file is a .npz file, the returned value supports the context manager protocol in a similar fashion to the open function:
with load('foo.npz') as data: a = data['a'] 

Store data to disk, and load it again:

>>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]])) >>> np.load('/tmp/123.npy') array([[1, 2, 3], [4, 5, 6]]) 

Store compressed data to disk, and load it again:

>>> a=np.array([[1, 2, 3], [4, 5, 6]]) >>> b=np.array([1, 2]) >>> np.savez('/tmp/123.npz', a=a, b=b) >>> data = np.load('/tmp/123.npz') >>> data['a'] array([[1, 2, 3], [4, 5, 6]]) >>> data['b'] array([1, 2]) >>> data.close() 

Mem-map the stored array, and then access the second row directly from disk:

>>> X = np.load('/tmp/123.npy', mmap_mode='r') >>> X[1, :] memmap([4, 5, 6]) 

Источник

Оцените статью