I have a dataframe as below
df = pd.DataFrame({'a':[1,1,1,2,2,2],
'b': [10, 20, 30, 20, 40, 60],
'c': [80, 80, 80, 120, 120, 120]})
I want to get 3D array
array([[[ 1, 10, 80],
[ 2, 20, 120] ],
[[ 1, 20, 80] ,
[ 2, 40, 120] ],
[[ 1, 30, 80],
[ 2, 60, 120]]], dtype=int64)
I do like this
values = df.values
values.reshape(3, 2, 3)
and get an incorrect array. How to get the expected array?
Get the array data, then reshape splitting the first axis into two with the first of them being of length 2
giving us a 3D
array and then swap those two axes -
df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Sample run -
In [711]: df
Out[711]:
a b c
0 1 10 80
1 1 20 80
2 1 30 80
3 2 20 120
4 2 40 120
5 2 60 120
In [713]: df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Out[713]:
array([[[ 1, 10, 80],
[ 2, 20, 120]],
[[ 1, 20, 80],
[ 2, 40, 120]],
[[ 1, 30, 80],
[ 2, 60, 120]]])
This gives us a view into the original data without making a copy and as such has a minimal constant time.
Runtime test
Case #1 :
In [730]: df = pd.DataFrame(np.random.randint(0,9,(2000,100)))
# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [731]: %timeit np.stack(np.split(df.values, 2), axis=1)
10000 loops, best of 3: 109 µs per loop
In [732]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.55 µs per loop
Case #2 :
In [733]: df = pd.DataFrame(np.random.randint(0,9,(2000,2000)))
# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [734]: %timeit np.stack(np.split(df.values, 2), axis=1)
100 loops, best of 3: 4.3 ms per loop
In [735]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.37 µs per loop
Try np.split
+ np.stack
:
np.stack(np.split(df.values, 2), axis=1)
array([[[ 1, 10, 80],
[ 2, 20, 120]],
[[ 1, 20, 80],
[ 2, 40, 120]],
[[ 1, 30, 80],
[ 2, 60, 120]]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.