Python numpy string to float

Converting numpy string array to float: Bizarre?

So, this should be a really straightforward thing but for whatever reason, nothing I’m doing to convert an array of strings to an array of floats is working. I have a two column array, like so:

Name Value Bob 4.56 Sam 5.22 Amy 1.22 
for row in myarray[1:,]: row[1]=float(row[1]) 
for row in myarray[1:,]: row[1]=row[1].astype(1) 
myarray[1:,1] = map(float, myarray[1:,1]) 

Aside: if you’re working with mixed-type data, it might be worth your time to look at pandas. It already incorporates a lot of the tools you’d otherwise need to reimplement in order to get pure numpy to do the sort of ops people usually perform on names-and-numbers datasets.

2 Answers 2

Numpy arrays must have one dtype unless it is structured. Since you have some strings in the array, they must all be strings.

If you wish to have a complex dtype , you may do so:

import numpy as np a = np.array([('Bob','4.56'), ('Sam','5.22'),('Amy', '1.22')], dtype = [('name','S3'),('val',float)]) 

Note that a is now a 1d structured array, where each element is a tuple of type dtype .

You can access the values using their field name:

In [21]: a = np.array([('Bob','4.56'), ('Sam','5.22'),('Amy', '1.22')], . dtype = [('name','S3'),('val',float)]) In [22]: a Out[22]: array([('Bob', 4.56), ('Sam', 5.22), ('Amy', 1.22)], dtype=[('name', 'S3'), ('val', ' 

Источник

Читайте также:  Html connection time out

numpy.fromstring#

A new 1-D array initialized from text data in a string.

Parameters : string str

A string containing the data.

dtype data-type, optional

The data type of the array; default: float. For binary input data, the data must be in exactly this format. Most builtin numeric types are supported and extension types may be supported.

New in version 1.18.0: Complex dtypes.

Read this number of dtype elements from the data. If this is negative (the default), the count will be determined from the length of the data.

sep str, optional

The string separating numbers in the data; extra whitespace between elements is also ignored.

Deprecated since version 1.14: Passing sep='' , the default, is deprecated since it will trigger the deprecated binary mode of this function. This mode interprets string as binary bytes, rather than ASCII text with decimal numbers, an operation which is better spelt frombuffer(string, dtype, count) . If string contains unicode text, the binary mode of fromstring will first encode it into bytes using utf-8, which will not produce sane results.

Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument.

If the string is not the correct size to satisfy the requested dtype and count.

>>> np.fromstring('1 2', dtype=int, sep=' ') array([1, 2]) >>> np.fromstring('1, 2', dtype=int, sep=',') array([1, 2]) 

Источник

Convert String to Float in NumPy

Convert String to Float in NumPy

  1. Convert String to Float in NumPy Using the astype() Method
  2. Convert String to Float in NumPy Using the asarray() Method
  3. Convert String to Float in NumPy Using the asfarray() Method

Conversions between numerical data and string data are a bit difficult to handle. But in Python, all this is almost effortless because Python has many inbuilt functionalities specially crafted to drive conversions like so.

Python even has a bunch of external libraries to handle these conversions with improved performance and speeds. NumPy is one such library.

NumPy is a Python library that is all about multidimensional arrays and matrices and all sorts of mathematical and logical computations that we can perform on them. This article will introduce how to convert a NumPy string array to a NumPy float array using NumPy itself.

Convert String to Float in NumPy Using the astype() Method

astype is an in-build class function for objects of type ndarray . This method returns a copy of the ndarray and casts it into a specified datatype.

The syntax of the astype() method is below.

astype(dtype, order, casting, subok, copy) 
  • dtype - It is the datatype to which the NumPy array will be cast to,
  • order - This is an optional parameter. This parameter controls the memory layout of the resulting array or the output.
  • casting - This is an optional parameter. This parameter controls the type of casting or conversion that will take place. By default, its value is unsafe .
  • subok - This is an optional boolean parameter. It decides whether the output array will be of type nparray or ndarray ’s sub-classes.
  • copy - This is an optional boolean parameter. It decides if the output should be a newly allocated array or not.

You can use this method to get the conversion done. Refer to the following code snippet.

import numpy as np  stringArray = np.array(["1.000", "1.235", "0.000125", "2", "55", "-12.35", "0", "-0.00025"]) floatArray = stringArray.astype(float) print(stringArray) print(floatArray) 
['1.000' '1.235' '0.000125' '2' '55' '-12.35' '0' '-0.00025'] [ 1.000e+00 1.235e+00 1.250e-04 2.000e+00 5.500e+01 -1.235e+01  0.000e+00 -2.500e-04] 

To learn more about this method, refer to its official documentation

Convert String to Float in NumPy Using the asarray() Method

asarray() is a NumPy function that converts the input array to a NumPy array of a specified type. The input array can be a Python’s list, tuples, tuples of lists, list of tuples, lists of lists, tuples of tuples, and a NumPy array itself.

Источник

How can I change NumPy array elements from string to int or float?

I have a data set stored in NumPy array like shown in below, but all the data inside it is stored as string. How can I change the string to int or float , and store it in back?

 [['1' '0' '3' . '7.25' '' 'S'] ['2' '1' '1' . '71.2833' 'C85' 'C'] ['3' '1' '3' . '7.925' '' 'S'] . ['889' '0' '3' . '23.45' '' 'S'] ['890' '1' '1' . '30' 'C148' 'C'] ['891' '0' '3' . '7.75' '' 'Q']] 
 data[0. 0] = data[0. 0].astype(int) 

3 Answers 3

You could set the data type ( dtype ) at array initialization. For example if your rows are composed by one 32-bit integer and one 4-byte string you could specify the dtype 'i4, S4' .

data = np.array([(1, 'a'), (2, 'b')], dtype='i4, S4') 

You could read more about dtypes here.

@PadraicCunningham You are specifying that the data type (dtype) for each row is a 4-byte integer and a 4-byte string.

I a not asking for myself, I posted a link in the comments to a recarray already. Some explanation for the OP and how he/she is going to get the original data object into an array with the first column as an integer would be good.

@PadraicCunningham: In fact it sounded a strange question from someone skilled like you 😉 I will add the details to the answer.

I can make an array that contains strings by starting with lists of strings; note the S4 dtype:

In [690]: data=np.array([['1','0','7.23','two'],['2','3','1.32','four']]) In [691]: data Out[691]: array([['1', '0', '7.23', 'two'], ['2', '3', '1.32', 'four']], dtype='|S4') 

It's more likely that such an array is created by reading a csv file.

I can also view it as an array of single byte strings - the shape and dtype has changed, but the databuffer is the same (the same 32 bytes)

In [692]: data.view('S1') Out[692]: array([['1', '', '', '', '0', '', '', '', '7', '.', '2', '3', 't', 'w', 'o', ''], ['2', '', '', '', '3', '', '', '', '1', '.', '3', '2', 'f', 'o', 'u', 'r']], dtype='|S1') 

In fact, I can change an individual byte, changing the two of the original array to twos :

In [693]: data.view('S1')[0,-1]='s' In [694]: data Out[694]: array([['1', '0', '7.23', 'twos'], ['2', '3', '1.32', 'four']], dtype='|S4') 

But if I try to change an element of data to an integer, it is converted to a string to match the S4 dtype:

In [695]: data[1,0]=4 In [696]: data Out[696]: array([['1', '0', '7.23', 'twos'], ['4', '3', '1.32', 'four']], dtype='|S4') 

The same would happen if the number came from int(data[1,0]) or some variation on that.

But I can trick it into seeing the integer as a string of bytes (represented as \x04 )

In [704]: data[1,0]=np.array(4).view('S4') In [705]: data Out[705]: array([['1', '0', '7.23', 'twos'], ['\x04', '3', '1.32', 'four']], dtype='|S4') 

Arrays can share data buffers. The data attribute is a pointer to a block of memory. It's the array's dtype that controls how that block is interpreted. For example I can make another array of ints, and redirect it's data attribute:

In [714]: d2=np.zeros((2,4),dtype=int) In [715]: d2 Out[715]: array([[0, 0, 0, 0], [0, 0, 0, 0]]) In [716]: d2.data=data.data # change the data pointer In [717]: d2 Out[717]: array([[ 49, 48, 858926647, 1936684916], [ 4, 51, 842214961, 1920298854]]) 

Now d2[1,0] is the integer 4. But the other items are not recognizable, because they are strings viewed as integers. That's not the same as passing them through the int() function.

I don't recommend changing the data pointer like this as a regular practice. It would be easy to mess things up. I had to take care to ensure that d2.nbytes was 32, the same as for data .

Because the buffer is sharded, a change to d2 also appears in data (but displayed according to a different dtype):

In [718]: d2[0,0]=3 In [719]: data Out[719]: array([['\x03', '0', '7.23', 'twos'], ['\x04', '3', '1.32', 'four']], dtype='|S4') 

A view with a complex dtype does something similar:

In [723]: data.view('i4,i4,f,|S4') Out[723]: array([[(3, 48, 4.148588672592268e-08, 'twos')], [(4, 51, 1.042967401332362e-08, 'four')]], dtype=[('f0', ' 

Notice the 48 and 51 that also appear in d2 . The next float column is unrecognizable.

That gives an idea of what can and cannot be done 'in-place'.

But to get an array that contains numbers and strings in a meaningful way, I it is better to construct a new structured array. Perhaps the cleanest way to do that is with an intermediary list of tuples.

In [759]: dl=[tuple(i) for i in data.tolist()] In [760]: dl Out[760]: [('1', '0', '7.23', 'two'), ('2', '3', '1.32', 'four')] In [761]: np.array(dl,dtype='i4,i4,f,|S4') Out[761]: array([(1, 0, 7.230000019073486, 'two'), (2, 3, 1.3200000524520874, 'four')], dtype=[('f0', ' 

All these fields take up 4 bytes, so the nbytes is the same. But the individual values have passed through converters. I have given 'np.array' the freedom to convert values as is consistent for the input and the new dtype. That's a lot easier than trying to perform some sort of convoluted in-place conversion.

A list tuples with a mix of numbers and strings would also have worked:

Structured arrays are displayed a list of tuples. And in the structured array docs, values are always input as list of tuples.

recarray can also be used, but essentially that is just a array subclass that lets you access fields as attributes.

If the original array was generated from a csv file, it would have been better to use np.genfromtxt (or loadtxt ) with appropriate options. It can generate the appropriate list(s) of tuples, and return a structured array directly.

Источник

Оцените статью