简体   繁体   中英

Copying a list into a numpy array converts floats into strings

I have a list with 4 columns, 3 of which are np.floats and 1 of them is a str, when i run this

print("cluster", clusters[0], type(clusters[0][0]))
numpyClusters = np.copy(clusters)
print("numpyCluster", numpyClusters[0], type(numpyClusters[0][0]))
finalizedClusters = kMeansAlgorithm(dataSet, centeroids, clusters, oldClusters, k)

it prints

cluster [0.12072145106500658, 1, 1.0337254896570043, 'winter'] <class 'numpy.float64'>
numpyCluster ['0.12072145106500658' '1' '1.0337254896570043' 'winter'] <class 'numpy.str_'>

Later on i need the first 3 columns to be floats, Can you copy a list into a numpy array with all the values inside keeping their original type?

As stated by @couka, a numpy array can only have one type of data in it. Though, you can cheat a little, for numpy allows the object dtype. For instance:

>>> import numpy as np

>>> test = ["a", 1, 2, 3]
>>> test_array = np.array(test)
>>> test_array
array(['a', '1', '2', '3'], dtype='<U1')
>>> test_array * 2
numpy.core._exceptions.UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U3'), dtype('<U3')) -> dtype('<U3')
>>> test_array_object = np.array(test, dtype="O")
>>> test_array_object
array(['a', 1, 2, 3], dtype=object)
>>> test_array_object * 2
array(['aa', 2, 4, 6], dtype=object)

When the dtype of the array is object and an operation is performed, numpy will simply apply the corrersponding operation on its elements. For instance, when multiplying an array with an integer, numpy will call the __mul__ method of each object in the array.

This method has several drawbacks though. If you only want to apply the operation on the integers, you have to do some indexing:

>>> test_array_object[1:] *= 2
>>> test_array_object
array(['a', 2, 4, 6], dtype=object)

More importantly, numpy can't use every optimization it usually uses for integer arrays, so you would have a cost overhead in speed execution by doing so.

I think the simplest method is to simply use a dataclass to hold your data and define the operations you want to apply on it, like this:

from dataclasses impoort dataclass

import numpy as np

@dataclass
class NamedArray:
    name: str
    array: np.ndarray

which you would then use like this:

>>> test = ["a", 1., 2., 3.]
>>> named_test = NamedArray(test[0], np.array(test[1:]))
>>> named_test
NamedArray(name='a', array=array([1, 2, 3]))
>>> named_test.array
array([1., 2., 3.])
>>> named_test.array.dtype
dtype('float64')

You are then to free to work with named_test.array to do your computations on an array, while keeping numpy's optimizations on the operations you perform and without having to care about the str held in named_test.name .

From: https://numpy.org/doc/stable/reference/generated/numpy.array.html (Emphasis mine)

 numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)

[...]

dtype: data-type, optional The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

Since you can represent any float as a string, but not vice-versa, numpy decides to use strings for everything when initialized with a mixed list of floats and strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM