numpy array to ndarray

Question

I have an exported pandas dataframe that is now a numpy.array object.

subset = array[:4,:]
array([[  2.        ,  12.        ,  33.33333333,   2.        ,
         33.33333333,  12.        ],
       [  2.        ,   2.        ,  33.33333333,   2.        ,
         33.33333333,   2.        ],
       [  2.8       ,   8.        ,  45.83333333,   2.75      ,
         46.66666667,  13.        ],
       [  3.11320755,  75.        ,  56.        ,   3.24      ,
         52.83018868,  33.        ]])
print subset.dtype
dtype('float64')

I was to convert the column values to specific types, and set column names as well, this means I need to convert it to a ndarray.

Here are my dtypes:

[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), 
('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
('NULL_COUNT_B', '<f8')]

When I go to convert the array, I get:

 ValueError: new type not compatible with array.

How do you cast each column to a specific value so I can convert the array to an ndarray?

Thanks

Answer 1

You already have an ndarray . What you are seeking is a structured array, one with this compound dtype. First see if pandas can do it for you. If that fails we might be able to do something with tolist and a list comprehension.

In [84]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
    ...: f8'), 
    ...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
    ...: ('NULL_COUNT_B', '<f8')]
In [85]: subset=np.array([[  2.        ,  12.        ,  33.33333333,   2.       
    ...:  ,
    ...:          33.33333333,  12.        ],
    ...:        [  2.        ,   2.        ,  33.33333333,   2.        ,
    ...:          33.33333333,   2.        ],
    ...:        [  2.8       ,   8.        ,  45.83333333,   2.75      ,
    ...:          46.66666667,  13.        ],
    ...:        [  3.11320755,  75.        ,  56.        ,   3.24      ,
    ...:          52.83018868,  33.        ]])
In [86]: subset
Out[86]: 
array([[  2.        ,  12.        ,  33.33333333,   2.        ,
         33.33333333,  12.        ],
       [  2.        ,   2.        ,  33.33333333,   2.        ,
         33.33333333,   2.        ],
       [  2.8       ,   8.        ,  45.83333333,   2.75      ,
         46.66666667,  13.        ],
       [  3.11320755,  75.        ,  56.        ,   3.24      ,
         52.83018868,  33.        ]])

Now make an array with dt . Input for a structured array has to be a list of tuples - so I'm using tolist and a list comprehension

In [87]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
....
ValueError: field 'NULL_COUNT_B' occurs more than once
In [88]: subset.shape
Out[88]: (4, 6)
In [89]: dt
Out[89]: 
[('PERCENT_A_NEW', '<f8'),
 ('JoinField', '<i4'),
 ('NULL_COUNT_B', '<f8'),
 ('PERCENT_COMP_B', '<f8'),
 ('RANKING_A', '<f8'),
 ('RANKING_B', '<f8'),
 ('NULL_COUNT_B', '<f8')]
In [90]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
    ...: f8'), 
    ...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')]
In [91]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
Out[91]: 
array([(2.0, 12, 33.33333333, 2.0, 33.33333333, 12.0),
       (2.0, 2, 33.33333333, 2.0, 33.33333333, 2.0),
       (2.8, 8, 45.83333333, 2.75, 46.66666667, 13.0),
       (3.11320755, 75, 56.0, 3.24, 52.83018868, 33.0)], 
      dtype=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')])

numpy array to ndarray

Question

1 answers

solution1
2 ACCPTED 2016-11-17 17:09:15

numpy array to ndarray

Question

1 answers

solution1 2 ACCPTED 2016-11-17 17:09:15

solution1
2 ACCPTED 2016-11-17 17:09:15