简体   繁体   中英

How to insert column of different type to numpy array?

I would like to append two numpy arrays of type np.datetime64 and int to another.

This leads to an error. What do I have to do to correct this?

It works without error, if I append the vectors to itself (ie: np.append(c,c,axis=1) or np.append(a,a,axis=1) )

numpy version: 1.14.3

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
c
Out[2]: 
array([[0],
       [1],
       [2],
       [3],
       [4]])
d = np.append(c,a,axis=1)
Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-10548a83d1a2>", line 1, in <module>
    d = np.append(c,a,axis=1)
  File "/home/user/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 5166, in append
    return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion

Probably easiest - work with a Pandas DataFrame instead of an array

Truthfully, while Numpy arrays can be made to work with heterogenous columns, they may not be what most users actually need in this case. For many use cases, you may be better off using a Pandas DataFrame . Here's how to convert your two columns to a DataFrame called df :

import numpy as np
import pandas as pd

a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)


df = pd.DataFrame(dict(date=a.ravel(), val=c.ravel()))
print(df)
# output:
#                      date  val
#     0 2018-04-01 15:30:00    0
#     1 2018-04-01 15:31:00    1
#     2 2018-04-01 15:32:00    2
#     3 2018-04-01 15:33:00    3
#     4 2018-04-01 15:34:00    4

You can then work with each of your columns like so:

print(df['date'])
# output:
#     0   2018-04-01 15:30:00
#     1   2018-04-01 15:31:00
#     2   2018-04-01 15:32:00
#     3   2018-04-01 15:33:00
#     4   2018-04-01 15:34:00
#     Name: date, dtype: datetime64[ns]

DataFrame objects provide a ton of methods that make it pretty easy to analyze this kind of data. See the Pandas docs (or other QAs on this site) for more info about DataFrame objects.

Numpy only solution - structured arrays

Generally, you should avoid arrays of dtype=object if you can. They cause performance issues with many of the basic Numpy operations (such as arithmetic, eg arr0 + arr1 ), and they may behave in ways you don't expect.

A better Numpy only solution is structured arrays. These arrays have a compound dtype , with one part per field (for the sake of this discussion, "field" is equivalent to "column", though you can do more interesting things with fields ). Given your a and c arrays, here's how you can create a structured array:

# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))

# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)

# populate the structured array with the data from your column arrays
struct['date'], struct['val'] = a.T, c.T

print(struct)
# output:
#     array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
#            ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
#            ('2018-04-01T15:34:00', 4)],
#           dtype=[('date', '<M8[s]'), ('val', '<i8')])

You can then access the specific columns by indexing them with their name (just like you could with the DataFrame ):

print(struct['date'])
# output:
#     ['2018-04-01T15:30:00' '2018-04-01T15:31:00' '2018-04-01T15:32:00'
#      '2018-04-01T15:33:00' '2018-04-01T15:34:00']

Structured array pitfalls

You can't, for example, add two structured arrays:

# doesn't work
struct0 + struct1

but you can add the fields of two structured arrays:

# works great
struct0['val'] + struct1['val']

In general, the fields behave just like standard Numpy arrays.

Taking into account the statements of the other users, leads to the insight, that converting the first array to dtype object is at least a workaround.

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
       ['2018-04-01T15:31:00'],
       ['2018-04-01T15:32:00'],
       ['2018-04-01T15:33:00'],
       ['2018-04-01T15:34:00']], dtype='datetime64[s]')
a = a.astype("object")
c = np.array([0,1,2,3,4]).reshape(-1,1)
d = np.append(a,c,axis=1)
d

.

array([[datetime.datetime(2018, 4, 1, 15, 30), 0],
   [datetime.datetime(2018, 4, 1, 15, 31), 1],
   [datetime.datetime(2018, 4, 1, 15, 32), 2],
   [datetime.datetime(2018, 4, 1, 15, 33), 3],
   [datetime.datetime(2018, 4, 1, 15, 34), 4]], dtype=object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM