简体   繁体   中英

extend array intervals of arrays by the same array

I have to setup an input tensor for machine learning purposes which look like:

tensor=array[array[object_1],array[object_2],...,array[object_n]]
np.shape(tensor)=(a,n,6)

now every object array is 1-dimensional with let's say 6 entries that are the variables to describe them. I want to extend these 6 entries with 4 more entries. The variables for this extended information is saved in an array

np.shape(extra_information)=(a,m,4) #m<n

m is less than n because each array in extra_information is for an intervall of objects. I can use for loops to do this but it has to scale well for large numbers of a. I thought about bringing extra_infromation into the shape of np.shape(extra_information)=(a,m=n,4) and then use something like np.dstack([tensor,extra_information],axis=1) but I am not sure if this is the most elegant solution or how to get the extra_information into the shape I want while scaling well. The inverals are saved into an array,too:

 np.shape(extra_information_intervall)=(a,m,2) #m<n

Edit:

I have no following solution that works but is surely inefficient:

    def extend(track,jet,padding_size):
        event,jets,var=np.shape(jet)
        jet_to_track_data=[]
        for i in jet:
            event_jet_to_track=[]
            lasttrack=0
            for k in i:
                if k[4]!=0:
                    
                    x=np.array([k[l] for l in range(var-2)])
                    shape=(int(k[var-1]-k[var-2]),var-2)
                    
                    value=np.broadcast_to(x,shape)
                    event_jet_to_track.append(value)
                    lasttrack=k[var-1]
                else:
                    x=np.array((var-2)*[0])
                    shape=(int(padding_size-lasttrack),var-2)
                    
                    value=np.broadcast_to(x,shape)
                    
                    event_jet_to_track.append(value)
                    break
        
          jet_to_track_data.append(event_jet_to_track)
      jet_to_track_data=[np.vstack(x) for x in jet_to_track_data]
      jet_to_track_data=np.stack(jet_to_track_data)
      extended=np.concatenate([track,jet_to_track_data],axis=2)
      return extended

To clarify the problem further. with an example in 2-d:

tensor=[[1,2,3],[3,4,5],[6,7,8],....,[...]]
extra_information=[[a],[b],[c],....[...]]
extra_information_intervall=[[0,2],[3,4],...,[0,0]]

Due to zerro padding the shape of extra_information is (a,n,4) but it only contains information to the m-entry and will be filled with zeros inbetween n and m. this is true for extra_information_intervall , too. now the goal is to merge these informations into tensor like:

tensor=[[1,2,3,a],[3,4,5,a],[6,7,8,b],....,[...,0]]

IIUC, you have an (a,n,6) tensor which comprises of n objects, some of which are interval objects. These interval objects each have 4 more features apart from the 6 features already available. The tensor that holds these 4 features is (a,m,4) , where m<n and m = the number of interval objects among n objects.

Assuming that these intervals start from the 0th object AND they are repeating baseds on a given number of repeatitions (interval lengths) I can safely say that the structure is following -

ORIGINAL            ADDITIONAL
Obj0 [......]  -->  Obj0 [....]
Obj1 [......]       
Obj2 [......]       
Obj3 [......]  -->  Obj3 [....]
Obj4 [......]        
Obj5 [......]        
Obj6 [......]  -->  Obj6 [....] 
Obj7 [......]
Obj8 [......]

Assuming that you want to simply copy the additional information to the next few objects until the subsequent interval is reached, you basically will fill the gaps between the m intervals such that now you have n objects with additional info.

You can do this with an np.repeat . You can calculate how many times to repeat each of the m objects from your intervals list and store it in s . A detailed explanation about this is in the last paragraph.

a = np.random.random((2,10,6))
b = np.random.random((2,5,4))

s = np.array([2,3,2,2,1])  #Number of elements = number of objects m
                           #Sum of elements = number of objects n

new_b = np.repeat(b, s, axis=1)
#new_b shape = (2,10,4)

out = np.dstack((a, new_b))
out.shape
(2, 10, 10)

This would do the following -

ORIGINAL            ADDITIONAL
Obj0 [......]  -->  Obj0 [....]
Obj1 [......]  -->  Obj0 [....]       
Obj2 [......]  -->  Obj0 [....]       
Obj3 [......]  -->  Obj3 [....]
Obj4 [......]  -->  Obj3 [....]        
Obj5 [......]  -->  Obj3 [....]        
Obj6 [......]  -->  Obj6 [....] 
Obj7 [......]  -->  Obj6 [....] 
Obj8 [......]  -->  Obj6 [....] 

EDIT: I have updated my answer based on your inputs. Since the intervals can be of variable length, you can still simply use np.repeat but just pass another parameter s which tells it how many times to repeat each element. This can be calculated from your list of intervals. The length of this s array needs to be equal to m objects so it can tell how many times each of these objects need to be repeated respectively. AND, the sum of s needs to be equal to n since the output array after repeating needs to be have same objects as the original array = n

s = np.array([2,3,2,2,1])  #Number of elements = number of objects m
                           #Sum of elements = number of objects n

#First object from m objects will repeat 2 times, 
#second will repeat 3 times, etc...

#Total number of objects created after 
#repetitions = 10 = objects in original tensor.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM