简体   繁体   中英

Concatenate value via np.concatenate python

Any help to get me started is appreciated. I have a large data frame and this is a small example where I am trying to concatenate arrays in the data frame but it is extremely slow for large data (many rows and columns).

This is the code.

import numpy as np
import pandas as pd
# Define a dictionary containing students data 
data = {#'Name': ['Ankit', 'Amit'], 
                'Val': [np.zeros(2)],
                'Val1':[np.ones(2)],
                'Val2':[np.ones(2)]
} 
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data, columns = ['Val','Val1','Val2']) 

for index, row in df.iterrows(): 
    prev = np.zeros((1,1,2))
    for r in row:
        prev = np.concatenate((prev, [[r.tolist()]]))
X=np.delete(prev, 0,axis=0).reshape(df.shape[0],df.shape[1],2)

I have searched and some post suggest to use pandas apply but I don't know how to do it. Just wondering if anyone could share a better way to do it so that I can started. Thank you.

Edit: Updated the answer based on the suggested by @StupidWolf. Thank you!

def format(df):
    result = []
    for idx in df.index:
        print(idx)
        X=np.vstack(df.to_numpy()[idx])
        print(X)
        result.append(X)
    return result

    result = format(df)
    nsamples, nx, ny = result.shape
    print(nsamples,nx,ny)

When I print X, I get the expected output and shape but when I add it to the list, I am unable to see the shape. The shape is important for me as I need to change the dimensional. Thank you.

If the aim is to stack the columns for every row entry, you can do this:

df = pd.DataFrame({'Val0':[np.random.randint(0,5,2) for i in range(3)],
'Val1':[np.random.randint(0,5,2) for i in range(3)],
'Val2':[np.random.randint(0,5,2) for i in range(3)]})

Dataframe looks like this:

     Val0    Val1    Val2
0  [2, 1]  [0, 1]  [1, 0]
1  [0, 3]  [0, 0]  [3, 4]
2  [2, 2]  [3, 3]  [2, 0]

You can stack very row like this:

[np.vstack(i) for i in df.to_numpy()]

To see the shape you need to make this list a np.array:

res = np.array([np.vstack(i) for i in df.to_numpy()])
res.shape
(3, 3, 2)

Gives you a list, length of your rows:

[array([[2, 1],
        [0, 1],
        [1, 0]]),
 array([[0, 3],
        [0, 0],
        [3, 4]]),
 array([[2, 2],
        [3, 3],
        [2, 0]])]

Or if you are familiar with pandas:

res = df.apply(lambda i:np.vstack(i),axis=1)

It's organised into a series and very element has the shape you need:

res[0]
 
array([[2, 1],
       [0, 1],
       [1, 0]])

res[1]
 
array([[0, 3],
       [0, 0],
       [3, 4]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM