Python read file in numpy array

numpy.loadtxt#

numpy. loadtxt ( fname , dtype= , comments=’#’ , delimiter=None , converters=None , skiprows=0 , usecols=None , unpack=False , ndmin=0 , encoding=’bytes’ , max_rows=None , * , quotechar=None , like=None ) [source] #

Load data from a text file.

Parameters : fname file, str, pathlib.Path, list of str, generator

File, filename, list, or generator to read. If the filename extension is .gz or .bz2 , the file is first decompressed. Note that generators must return bytes or strings. The strings in a list or produced by a generator are treated as lines.

dtype data-type, optional

Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.

comments str or sequence of str or None, optional

The characters or list of characters used to indicate the start of a comment. None implies no comments. For backwards compatibility, byte strings will be decoded as ‘latin1’. The default is ‘#’.

delimiter str, optional

The character used to separate the values. For backwards compatibility, byte strings will be decoded as ‘latin1’. The default is whitespace.

Changed in version 1.23.0: Only single character delimiters are supported. Newline characters cannot be used as the delimiter.

Converter functions to customize value parsing. If converters is callable, the function is applied to all columns, else it must be a dict that maps column number to a parser function. See examples for further details. Default: None.

Changed in version 1.23.0: The ability to pass a single callable to be applied to all columns was added.

Skip the first skiprows lines, including comments; default: 0.

usecols int or sequence, optional

Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.

Changed in version 1.11.0: When a single column has to be read it is possible to use an integer instead of a tuple. E.g usecols = 3 reads the fourth column the same way as usecols = (3,) would.

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(. ) . When used with a structured data-type, arrays are returned for each field. Default is False.

ndmin int, optional

The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed. Legal values: 0 (default), 1 or 2.

Encoding used to decode the inputfile. Does not apply to input streams. The special value ‘bytes’ enables backward compatibility workarounds that ensures you receive byte arrays as results if possible and passes ‘latin1’ encoded strings to converters. Override this value to receive unicode arrays and pass strings as input to converters. If set to None the system default is used. The default value is ‘bytes’.

Read max_rows rows of content after skiprows lines. The default is to read all the rows. Note that empty rows containing no data such as empty lines and comment lines are not counted towards max_rows, while such lines are counted in skiprows.

Changed in version 1.23.0: Lines containing no data, including comment lines (e.g., lines starting with ‘#’ or as specified via comments) are not counted towards max_rows.

The character used to denote the start and end of a quoted item. Occurrences of the delimiter or comment characters are ignored within a quoted item. The default value is quotechar=None , which means quoting support is disabled.

If two consecutive instances of quotechar are found within a quoted field, the first is treated as an escape character. See examples.

Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

Data read from the text file.

Load data with missing values handled as specified.

This function aims to be a fast reader for simply formatted files. The genfromtxt function provides more sophisticated handling of, e.g., lines with missing values.

Each row in the input text file must have the same number of values to be able to read all values. If all rows do not have same number of values, a subset of up to n columns (where n is the least number of values present in all rows) can be read by specifying the columns via usecols.

The strings produced by the Python float.hex method can be used as input for floats.

>>> from io import StringIO # StringIO behaves like a file object >>> c = StringIO("0 1\n2 3") >>> np.loadtxt(c) array([[0., 1.], [2., 3.]])

>>> d = StringIO("M 21 72\nF 35 58") >>> np.loadtxt(d, dtype='names': ('gender', 'age', 'weight'), . 'formats': ('S1', 'i4', 'f4')>) array([(b'M', 21, 72.), (b'F', 35, 58.)], dtype=[('gender', 'S1'), ('age', '

>>> c = StringIO("1,0,2\n3,0,4") >>> x, y = np.loadtxt(c, delimiter=',', usecols=(0, 2), unpack=True) >>> x array([1., 3.]) >>> y array([2., 4.])

The converters argument is used to specify functions to preprocess the text prior to parsing. converters can be a dictionary that maps preprocessing functions to each column:

>>> s = StringIO("1.618, 2.296\n3.141, 4.669\n") >>> conv = . 0: lambda x: np.floor(float(x)), # conversion fn for column 0 . 1: lambda x: np.ceil(float(x)), # conversion fn for column 1 . > >>> np.loadtxt(s, delimiter=",", converters=conv) array([[1., 3.], [3., 5.]])

converters can be a callable instead of a dictionary, in which case it is applied to all columns:

>>> s = StringIO("0xDE 0xAD\n0xC0 0xDE") >>> import functools >>> conv = functools.partial(int, base=16) >>> np.loadtxt(s, converters=conv) array([[222., 173.], [192., 222.]])

This example shows how converters can be used to convert a field with a trailing minus sign into a negative number.

>>> s = StringIO('10.01 31.25-\n19.22 64.31\n17.57- 63.94') >>> def conv(fld): . return -float(fld[:-1]) if fld.endswith(b'-') else float(fld) . >>> np.loadtxt(s, converters=conv) array([[ 10.01, -31.25], [ 19.22, 64.31], [-17.57, 63.94]])

Using a callable as the converter can be particularly useful for handling values with different formatting, e.g. floats with underscores:

>>> s = StringIO("1 2.7 100_000") >>> np.loadtxt(s, converters=float) array([1.e+00, 2.7e+00, 1.e+05])

This idea can be extended to automatically handle values specified in many different formats:

>>> def conv(val): . try: . return float(val) . except ValueError: . return float.fromhex(val) >>> s = StringIO("1, 2.5, 3_000, 0b4, 0x1.4000000000000p+2") >>> np.loadtxt(s, delimiter=",", converters=conv, encoding=None) array([1.0e+00, 2.5e+00, 3.0e+03, 1.8e+02, 5.0e+00])

Note that with the default encoding=»bytes» , the inputs to the converter function are latin-1 encoded byte strings. To deactivate the implicit encoding prior to conversion, use encoding=None

>>> s = StringIO('10.01 31.25-\n19.22 64.31\n17.57- 63.94') >>> conv = lambda x: -float(x[:-1]) if x.endswith('-') else float(x) >>> np.loadtxt(s, converters=conv, encoding=None) array([[ 10.01, -31.25], [ 19.22, 64.31], [-17.57, 63.94]])

Support for quoted fields is enabled with the quotechar parameter. Comment and delimiter characters are ignored when they appear within a quoted item delineated by quotechar:

>>> s = StringIO('"alpha, #42", 10.0\n"beta, #64", 2.0\n') >>> dtype = np.dtype([("label", "U12"), ("value", float)]) >>> np.loadtxt(s, dtype=dtype, delimiter=",", quotechar='"') array([('alpha, #42', 10.), ('beta, #64', 2.)], dtype=[('label', '

Quoted fields can be separated by multiple whitespace characters:

>>> s = StringIO('"alpha, #42" 10.0\n"beta, #64" 2.0\n') >>> dtype = np.dtype([("label", "U12"), ("value", float)]) >>> np.loadtxt(s, dtype=dtype, delimiter=None, quotechar='"') array([('alpha, #42', 10.), ('beta, #64', 2.)], dtype=[('label', '

Two consecutive quote characters within a quoted field are treated as a single escaped character:

>>> s = StringIO('"Hello, my name is ""Monty""!"') >>> np.loadtxt(s, dtype="U", delimiter=",", quotechar='"') array('Hello, my name is "Monty"!', dtype='

Read subset of columns when all rows do not contain equal number of values:

>>> d = StringIO("1 2\n2 4\n3 9 12\n4 16 20") >>> np.loadtxt(d, usecols=(0, 1)) array([[ 1., 2.], [ 2., 4.], [ 3., 9.], [ 4., 16.]])

Источник

numpy.fromfile#

Construct an array from data in a text or binary file.

A highly efficient way of reading binary data with a known data-type, as well as parsing simply formatted text files. Data written using the tofile method can be read using this function.

Parameters : file file or str or Path

Open file object or filename.

Changed in version 1.17.0: pathlib.Path objects are now accepted.

Data type of the returned array. For binary files, it is used to determine the size and byte-order of the items in the file. Most builtin numeric types are supported and extension types may be supported.

New in version 1.18.0: Complex dtypes.

Number of items to read. -1 means all items (i.e., the complete file).

Separator between items if file is a text file. Empty (“”) separator means the file should be treated as binary. Spaces (” “) in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace.

offset int

The offset (in bytes) from the file’s current position. Defaults to 0. Only permitted for binary files.

Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

More flexible way of loading data from a text file.

Do not rely on the combination of tofile and fromfile for data storage, as the binary files generated are not platform independent. In particular, no byte-order or data-type information is saved. Data can be stored in the platform independent .npy format using save and load instead.

>>> dt = np.dtype([('time', [('min', np.int64), ('sec', np.int64)]), . ('temp', float)]) >>> x = np.zeros((1,), dtype=dt) >>> x['time']['min'] = 10; x['temp'] = 98.25 >>> x array([((10, 0), 98.25)], dtype=[('time', [('min', '

Save the raw data to disk:

>>> import tempfile >>> fname = tempfile.mkstemp()[1] >>> x.tofile(fname)

Read the raw data from disk:

>>> np.fromfile(fname, dtype=dt) array([((10, 0), 98.25)], dtype=[('time', [('min', '

The recommended way to store and load data:

>>> np.save(fname, x) >>> np.load(fname + '.npy') array([((10, 0), 98.25)], dtype=[('time', [('min', '

Источник

Читайте также: Where statement mysql php