简体   繁体   English

通过相同的数组扩展数组的数组间隔

[英]extend array intervals of arrays by the same array

I have to setup an input tensor for machine learning purposes which look like:我必须为机器学习目的设置一个输入张量,如下所示:

tensor=array[array[object_1],array[object_2],...,array[object_n]]
np.shape(tensor)=(a,n,6)

now every object array is 1-dimensional with let's say 6 entries that are the variables to describe them.现在每个对象数组都是一维的,假设有 6 个条目是描述它们的变量。 I want to extend these 6 entries with 4 more entries.我想用另外 4 个条目扩展这 6 个条目。 The variables for this extended information is saved in an array此扩展信息的变量保存在数组中

np.shape(extra_information)=(a,m,4) #m<n

m is less than n because each array in extra_information is for an intervall of objects. m 小于 n 因为 extra_information 中的每个数组都用于对象的间隔。 I can use for loops to do this but it has to scale well for large numbers of a.我可以使用 for 循环来做到这一点,但它必须能够很好地扩展大量 a。 I thought about bringing extra_infromation into the shape of np.shape(extra_information)=(a,m=n,4) and then use something like np.dstack([tensor,extra_information],axis=1) but I am not sure if this is the most elegant solution or how to get the extra_information into the shape I want while scaling well.extra_infromation变成np.shape(extra_information)=(a,m=n,4)的形状,然后使用类似np.dstack([tensor,extra_information],axis=1)但我不确定这是最优雅的解决方案或如何在缩放良好的同时将extra_information变成我想要的形状。 The inverals are saved into an array,too: inverals 也保存到一个数组中:

 np.shape(extra_information_intervall)=(a,m,2) #m<n

Edit:编辑:

I have no following solution that works but is surely inefficient:

    def extend(track,jet,padding_size):
        event,jets,var=np.shape(jet)
        jet_to_track_data=[]
        for i in jet:
            event_jet_to_track=[]
            lasttrack=0
            for k in i:
                if k[4]!=0:
                    
                    x=np.array([k[l] for l in range(var-2)])
                    shape=(int(k[var-1]-k[var-2]),var-2)
                    
                    value=np.broadcast_to(x,shape)
                    event_jet_to_track.append(value)
                    lasttrack=k[var-1]
                else:
                    x=np.array((var-2)*[0])
                    shape=(int(padding_size-lasttrack),var-2)
                    
                    value=np.broadcast_to(x,shape)
                    
                    event_jet_to_track.append(value)
                    break
        
          jet_to_track_data.append(event_jet_to_track)
      jet_to_track_data=[np.vstack(x) for x in jet_to_track_data]
      jet_to_track_data=np.stack(jet_to_track_data)
      extended=np.concatenate([track,jet_to_track_data],axis=2)
      return extended

To clarify the problem further.进一步澄清问题。 with an example in 2-d:以二维为例:

tensor=[[1,2,3],[3,4,5],[6,7,8],....,[...]]
extra_information=[[a],[b],[c],....[...]]
extra_information_intervall=[[0,2],[3,4],...,[0,0]]

Due to zerro padding the shape of extra_information is (a,n,4) but it only contains information to the m-entry and will be filled with zeros inbetween n and m.由于zerro填充的形状extra_information(a,n,4)但它仅包含信息到第m条目和将被以零填充其间n和m。 this is true for extra_information_intervall , too.这也适用于extra_information_intervall now the goal is to merge these informations into tensor like:现在的目标是将这些信息合并到张量中,如:

tensor=[[1,2,3,a],[3,4,5,a],[6,7,8,b],....,[...,0]]

IIUC, you have an (a,n,6) tensor which comprises of n objects, some of which are interval objects. IIUC,你有一个(a,n,6)张量,它由n对象组成,其中一些是区间对象。 These interval objects each have 4 more features apart from the 6 features already available.除了已有的 6 个特征之外,这些区间对象每个都有 4 个特征。 The tensor that holds these 4 features is (a,m,4) , where m<n and m = the number of interval objects among n objects.拥有这 4 个特征的张量是(a,m,4) ,其中 m<n 和 m = n 个对象之间的间隔对象数。

Assuming that these intervals start from the 0th object AND they are repeating baseds on a given number of repeatitions (interval lengths) I can safely say that the structure is following -假设这些间隔从第 0 个对象开始,并且它们基于给定的重复次数(间隔长度)重复,我可以肯定地说结构如下 -

ORIGINAL            ADDITIONAL
Obj0 [......]  -->  Obj0 [....]
Obj1 [......]       
Obj2 [......]       
Obj3 [......]  -->  Obj3 [....]
Obj4 [......]        
Obj5 [......]        
Obj6 [......]  -->  Obj6 [....] 
Obj7 [......]
Obj8 [......]

Assuming that you want to simply copy the additional information to the next few objects until the subsequent interval is reached, you basically will fill the gaps between the m intervals such that now you have n objects with additional info.假设您只想将附加信息复制到接下来的几个对象,直到达到后续间隔,您基本上将填充m间隔之间的间隙,这样现在您有n带有附加信息的对象。

You can do this with an np.repeat .你可以用np.repeat做到这np.repeat You can calculate how many times to repeat each of the m objects from your intervals list and store it in s .您可以计算从您的时间间隔列表中重复每个m对象的次数并将其存储在s A detailed explanation about this is in the last paragraph.关于这一点的详细解释在最后一段。

a = np.random.random((2,10,6))
b = np.random.random((2,5,4))

s = np.array([2,3,2,2,1])  #Number of elements = number of objects m
                           #Sum of elements = number of objects n

new_b = np.repeat(b, s, axis=1)
#new_b shape = (2,10,4)

out = np.dstack((a, new_b))
out.shape
(2, 10, 10)

This would do the following -这将执行以下操作 -

ORIGINAL            ADDITIONAL
Obj0 [......]  -->  Obj0 [....]
Obj1 [......]  -->  Obj0 [....]       
Obj2 [......]  -->  Obj0 [....]       
Obj3 [......]  -->  Obj3 [....]
Obj4 [......]  -->  Obj3 [....]        
Obj5 [......]  -->  Obj3 [....]        
Obj6 [......]  -->  Obj6 [....] 
Obj7 [......]  -->  Obj6 [....] 
Obj8 [......]  -->  Obj6 [....] 

EDIT: I have updated my answer based on your inputs.编辑:我已经根据您的输入更新了我的答案。 Since the intervals can be of variable length, you can still simply use np.repeat but just pass another parameter s which tells it how many times to repeat each element.由于间隔可以是可变长度,您仍然可以简单地使用np.repeat但只需传递另一个参数s ,它告诉它每个元素重复多少次。 This can be calculated from your list of intervals.这可以从您的间隔列表中计算出来。 The length of this s array needs to be equal to m objects so it can tell how many times each of these objects need to be repeated respectively.这个s数组的长度需要等于m对象,所以它可以告诉每个对象需要分别重复多少次。 AND, the sum of s needs to be equal to n since the output array after repeating needs to be have same objects as the original array = n AND, s的总和需要等于n因为重复后的输出数组需要与原始数组具有相同的对象 = n

s = np.array([2,3,2,2,1])  #Number of elements = number of objects m
                           #Sum of elements = number of objects n

#First object from m objects will repeat 2 times, 
#second will repeat 3 times, etc...

#Total number of objects created after 
#repetitions = 10 = objects in original tensor.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM