简体   繁体   中英

Restructuring a 2-D numpy array into a 3-D numpy array according to values in a column of a dataframe

I have a 2-D numpy array let's say like this:

matrix([[1., 0., 0., ..., 1., 0., 0.],
        [1., 0., 0., ..., 0., 1., 1.],
        [1., 0., 0., ..., 1., 0., 0.],
        [1., 1., 0., ..., 1., 0., 0.],
        [1., 1., 0., ..., 1., 0., 0.],
        [1., 1., 0., ..., 1., 0., 0.]])

I want to transform it into a 3-D numpy array based on the values of a column of a dataframe. Let's say the column is like this:

df = pd.DataFrame({"Case":[1,1,2,2,3,4]})

The final 3-D array should look like this:

 matrix([
           [ 
              [1., 0., 0., ..., 1., 0., 0.], [1., 0., 0., ..., 0., 1., 1.] 
           ],
           [
              [1., 0., 0., ..., 1., 0., 0.], [1., 1., 0., ..., 1., 0., 0.]
           ],
           [
              [1., 1., 0., ..., 1., 0., 0.]
           ],
           [
              [1., 1., 0., ..., 1., 0., 0.]
           ]
        ])

The first 2 arrays of the initial 2-D array becomes a 2-D array of the final 3-D array because from the column of the dataframe the first and second rows both have the same values of '1'. Similarly, the next 2 arrays become another 2-D array of 2 arrays because the next two values of the column of the dataframe are '2' so the belong together. There is only one row for the values '3' and '4' so the next 2-D arrays of the 3-D array has only 1 array each.

So, basically if two or more numbers of the column of the dataframe are same, then those indices of rows of the 2-D initial matrix belong together and are transformed into a 2-D matrix and pushed as an element of the final 3-D matrix.

How do I do this?

Numpy doesn't have very good support for arrays with rows of different length , but you can make it a list of 2D arrays instead:

M = np.ndarray(
[[1., 0., 0., ..., 1., 0., 0.],
 [1., 0., 0., ..., 0., 1., 1.],
 [1., 0., 0., ..., 1., 0., 0.],
 [1., 1., 0., ..., 1., 0., 0.],
 [1., 1., 0., ..., 1., 0., 0.],
 [1., 1., 0., ..., 1., 0., 0.]]
)

df = pd.DataFrame({"Case":[1,1,2,2,3,4]})

M_per_case = [
    np.stack([M[index] for index in df.index[df['Case'] == case]]) 
    for case in set(df['Case'])
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM