How can I add two different type of data, string and int, into numpy ndarray

Question

I used pandas.read_csv to read a excel file, there are two columns in my file, one is a string type, the other is integer.

data = pandas.read_csv('data.csv')

Then, I printed out these data types for these numpy ndarrays.

print(type(data.get_values()[0, 0]))
print(type(data.get_values()[0, 1]))

result:

<class 'str'>
<class 'int'>

It showed me that there is a way to add two different data types in a same numpy ndarrays.

However, when I wanna try to add two different data types of data in a same numpy ndarrays:

arr = numpy.ndarray((1, 2))
arr[0][0] = 1
arr[0][1] = 'str'

The Console showed me this information:

ValueError: could not convert string to float: 'str'

Does anyone can tell me how to do that like class pandas did?

Answer 1

You can create numpy ndarray s with arbitrary C-style datatypes for each of the fields. The trick is to create the datatype for the array first, and then set that as the dtype for the array. The only annoying thing about this is, since they are C-style types, the types have to be defined explicitly and that includes, if you have strings, setting the number of characters each field can contain.

For example:

>>> import numpy as np
>>> person_dt = np.dtype([('Name', 'S25'), ('Age', np.uint8)])
>>> person_dt
dtype([('Name', 'S25'), ('Age', 'u1')])
>>> persons = np.array([('alice', 35), ('bob', 39)], dtype=person_dt)
>>> persons
array([(b'alice', 35), (b'bob', 39)],
      dtype=[('Name', 'S25'), ('Age', 'u1')])

Here I'm creating a numpy dtype . Each separate portion of an array is a field , and I'm assigning Name and Age to the names of those fields, and assigning the type for each field. So the Name field is a string of 25 characters or less (which is a \\0 terminated string like you would have in C), and the age is an unsigned integer since our ages will of course be less than 255. Note the b before the string just represents that the type is a byte-string

Then I simply create the array with the new dtype and pass in the values.

What's cool about this is you can grab the values by which field they belong to. For example, you can grab all the ages by grabbing the Age field, and it will have the type I assigned the ages to:

>>> persons['Age']
array([35, 39], dtype=uint8)

So you can go further and index into these resulting arrays:

>>> persons['Name'][1]
b'bob'

And you can still create and assign like you would normally:

>>> new_persons = np.zeros(5, dtype=person_dt)
>>> new_persons
array([(b'', 0), (b'', 0), (b'', 0), (b'', 0), (b'', 0)],
      dtype=[('Name', 'S25'), ('Age', 'u1')])
>>> new_persons[0] = ('alice', 25)
>>> new_persons[1] = ('bob', 26)
>>> new_persons['Name'][2:5]
array([b'', b'', b''],
      dtype='|S25')
>>> new_persons['Name'][2:5] = 'carol', 'david', 'eve'
>>> new_persons['Age'][2:5] = 27, 28, 29
>>> new_persons
array([(b'alice', 25), (b'bob', 26), (b'carol', 27), (b'david', 28), (b'eve', 29)],
      dtype=[('Name', 'S25'), ('Age', 'u1')])

I attended a talk a little while ago all about creating and managing numpy dtypes and it was great; the Jupyter notebook for the talk is online and you can access it here , which might shed a bit more light on all the different ways you can use them.

How can I add two different type of data, string and int, into numpy ndarray

Question

1 answers

solution1
2 ACCPTED 2017-09-28 07:23:29

How can I add two different type of data, string and int, into numpy ndarray

Question

1 answers

solution1 2 ACCPTED 2017-09-28 07:23:29

solution1
2 ACCPTED 2017-09-28 07:23:29