简体   繁体   English

通过调整从元组列表中获取的索引来创建数据框

[英]Creating dataframe by adjusting the index taken from list of tuples

I am trying to create a dataframe from the following list of tuples. 我试图从下面的元组列表中创建一个数据框。 The first item in the tuple is ID , second values is a list of values and the third is the lag . 元组中的第一项是ID ,第二个values是值列表,第三个是lag This lag defines how many indices I need to shift in either direction of the first tuple to get the dataframe. 这个lag定义了我需要在第一个元组的任一方向上移动多少索引才能获取数据帧。

mytup = [(111, [1,2,3,4,5], 0), (222, [33,44,55,66], 2), (333, [0,11,22,33], -1)]


ID  values           lag
111 1,2,3,4,5        0
222 33,44,55,66      2
333 0,11,22,33      -1

The result that is created as dataframe is given below. 下面给出了作为数据框创建的结果。 The first row is the header in this dataframe. 第一行是此数据帧中的标题。 The lag is always in reference to the first column ( 111 ). lag总是参考第一列( 111 )。

111 222 333
nan nan 0
1   nan 11
2   nan 22
3   33  33
4   44  nan
5   55  nan
nan 66  nan

The way I want to populate the dataframe is to start with first tuple. 我要填充数据框的方法是从第一个元组开始。 Then I take one tuple from the rest, one at a time and populate the dataframe introducing nan . 然后,我从其余的一个元组中取出一个元组,并在引入nan的数据帧中进行填充。

I will have this in dataframe once I have already seen the second tuple: 一旦看到第二个元组,我将在数据框中使用它:

111    222
1   nan
2   nan
3   33
4   44
5   55
nan 66

The third tuple has negative lag of -1 , therefore I want the previous dataframe to move down one position and create the final dataframe which I reproduce again. 第三元组的负滞后值为-1 ,因此我希望先前的数据帧向下移动一个位置并创建最终的数据帧,然后再次进行复制。

111 222 333
nan nan 0
1   nan 11
2   nan 22
3   33  33
4   44  nan
5   55  nan
nan 66  nan

Edit: 编辑:

As @cphlewis pointed out, the output will be dependent on the order in which columns are added. 正如@cphlewis指出的那样,输出将取决于列添加的顺序。 In my case the lag is always relative to the first (original) vector. 在我的情况下,滞后总是相对于第一个(原始)向量。 Therefore, the final result will remain same irrespective of the order. 因此,无论顺序如何,最终结果都将保持不变。

Using this (from the pandas.Series docstring): 使用此命令(来自pandas.Series文档字符串):

Operations between Series (+, -, /, *, **) align values based on their associated index values-- they need not be the same length. 系列(+,-,/,*,**)之间的运算根据其关联的索引值对齐值-它们的长度不必相同。 The result index will be the sorted union of the two indexes. 结果索引将是两个索引的排序联合。

import pandas as pd
from numpy import arange
#mytup = [(111, [1,2,3,4,5], 0), (222, [33,44,55,66], 2), (333, [0,11,22,33], -1)]
mytup = [(111, [1,2,3,4,5], 0),
         (222, [33,44,55,66], 2),
         (444, [1,2,3,4,5], 0),
         (333, [0,11,22,33], -1),
         ('a', [5,6,7], -2)]

def SfromTuple(row):
    name, data, shift = row
    return pd.Series(data, index = arange(shift, len(data) + shift))

reindexed = pd.concat([SfromTuple(row) for row in mytup], axis=1)
reindexed.columns = [x[0] for x in mytup]
print(reindexed)

result from original mytup : 来自原始mytup结果:

 0 111 222 333 -1 NaN NaN 0 0 1 NaN 11 1 2 NaN 22 2 3 33 33 3 4 44 NaN 4 5 55 NaN 5 NaN 66 NaN 

from the longer mytup added above: 从上面添加的更长的mytup

 0 111 222 444 333 a -2 NaN NaN NaN NaN 5 -1 NaN NaN NaN 0 6 0 1 NaN 1 11 7 1 2 NaN 2 22 NaN 2 3 33 3 33 NaN 3 4 44 4 NaN NaN 4 5 55 5 NaN NaN 5 NaN 66 NaN NaN NaN 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM