简体   繁体   中英

Extract dataframe features using numpy (as reindex)

I have a numpy array as:

[[1,521,3],
 [2,543,2],
 [3,555,3],
 [4,575,2]]

In pandas it look like this one:

Seconds   Price   Type
      1     521      3
      2     543      2
      3     555      3
      4     575      2

Then I set index to it:

types = df.T.unique()
df.set_index(['Type','Seconds'], inplace=True)

Output:

                 Price
Type   Seconds   
   3         1     521
   3         3     555
   2         2     543
   2         4     575

Then I have reindexed to put every second for every type:

for i in types: 
    df1 = df.xs(i, level=0).reindex([1,2,3,4], fill_value=0).reset_index()
    df['Type'] = i
    df.set_index(['Type', 'Seconds'], inplace=True)

Output:

                 Price
Type   Seconds   
   3         1     521
   3         2       0
   3         3     555
   3         4       0
   2         1       0
   2         2     543
   2         3       0
   2         4     575

It is easy to do it in pandas. How to do it in numpy? It should look like:

df.values

Here is one method you could use.

import numpy as np
ar = np.array([[1,521,3], [2,543,2], [3,555,3], [4,575,2]])

ar
Out[50]: 
array([[  1, 521,   3],
       [  2, 543,   2],
       [  3, 555,   3],
       [  4, 575,   2]])

Identify your expanded index:

u0 = np.unique(ar[:, 0])
u2 = np.unique(ar[:, 2])
rowcount = u0.shape[0]*u2.shape[0]
rows = np.stack([np.repeat(u2, rowcount//u2.shape[0]),
                 np.tile(u0, rowcount//u0.shape[0])],
                1)

rows
Out[51]: 
array([[2, 1],
       [2, 2],
       [2, 3],
       [2, 4],
       [3, 1],
       [3, 2],
       [3, 3],
       [3, 4]])

Figure out which you don't already have in your array:

row_index = np.sort(np.unique(np.concatenate([ar[:, [2, 0]], rows]),
                              return_index=True, axis=0)[1])
missing = rows[row_index[ar.shape[0]:]-ar.shape[0]]

missing
Out[52]: 
array([[2, 1],
       [2, 3],
       [3, 2],
       [3, 4]])

Then combine:

reindexed = np.zeros((rowcount, ar.shape[1]), int)
reindexed[:ar.shape[0], [1, 2, 0]] = ar
reindexed[ar.shape[0]:, [0, 1]] = missing

reindexed
Out[53]: 
array([[  3,   1, 521],
       [  2,   2, 543],
       [  3,   3, 555],
       [  2,   4, 575],
       [  2,   1,   0],
       [  2,   3,   0],
       [  3,   2,   0],
       [  3,   4,   0]])

Sort, if desired:

reindexed[np.lexsort([reindexed[:, 1], reindexed[:, 0]])]
Out[49]: 
array([[  2,   1,   0],
       [  2,   2, 543],
       [  2,   3,   0],
       [  2,   4, 575],
       [  3,   1, 521],
       [  3,   2,   0],
       [  3,   3, 555],
       [  3,   4,   0]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM