[英]Creating dataframe by adjusting the index taken from list of tuples
I am trying to create a dataframe from the following list of tuples. 我试图从下面的元组列表中创建一个数据框。 The first item in the tuple is ID
, second values
is a list of values and the third is the lag
. 元组中的第一项是ID
,第二个values
是值列表,第三个是lag
。 This lag
defines how many indices I need to shift in either direction of the first tuple to get the dataframe. 这个lag
定义了我需要在第一个元组的任一方向上移动多少索引才能获取数据帧。
mytup = [(111, [1,2,3,4,5], 0), (222, [33,44,55,66], 2), (333, [0,11,22,33], -1)]
ID values lag
111 1,2,3,4,5 0
222 33,44,55,66 2
333 0,11,22,33 -1
The result that is created as dataframe is given below. 下面给出了作为数据框创建的结果。 The first row is the header in this dataframe. 第一行是此数据帧中的标题。 The lag
is always in reference to the first column ( 111
). lag
总是参考第一列( 111
)。
111 222 333
nan nan 0
1 nan 11
2 nan 22
3 33 33
4 44 nan
5 55 nan
nan 66 nan
The way I want to populate the dataframe is to start with first tuple. 我要填充数据框的方法是从第一个元组开始。 Then I take one tuple from the rest, one at a time and populate the dataframe introducing nan
. 然后,我从其余的一个元组中取出一个元组,并在引入nan
的数据帧中进行填充。
I will have this in dataframe once I have already seen the second tuple: 一旦看到第二个元组,我将在数据框中使用它:
111 222
1 nan
2 nan
3 33
4 44
5 55
nan 66
The third tuple has negative lag of -1
, therefore I want the previous dataframe to move down one position and create the final dataframe which I reproduce again. 第三元组的负滞后值为-1
,因此我希望先前的数据帧向下移动一个位置并创建最终的数据帧,然后再次进行复制。
111 222 333
nan nan 0
1 nan 11
2 nan 22
3 33 33
4 44 nan
5 55 nan
nan 66 nan
Edit: 编辑:
As @cphlewis pointed out, the output will be dependent on the order in which columns are added. 正如@cphlewis指出的那样,输出将取决于列添加的顺序。 In my case the lag is always relative to the first (original) vector. 在我的情况下,滞后总是相对于第一个(原始)向量。 Therefore, the final result will remain same irrespective of the order. 因此,无论顺序如何,最终结果都将保持不变。
Using this (from the pandas.Series docstring): 使用此命令(来自pandas.Series文档字符串):
Operations between Series (+, -, /, *, **) align values based on their associated index values-- they need not be the same length. 系列(+,-,/,*,**)之间的运算根据其关联的索引值对齐值-它们的长度不必相同。 The result index will be the sorted union of the two indexes. 结果索引将是两个索引的排序联合。
import pandas as pd
from numpy import arange
#mytup = [(111, [1,2,3,4,5], 0), (222, [33,44,55,66], 2), (333, [0,11,22,33], -1)]
mytup = [(111, [1,2,3,4,5], 0),
(222, [33,44,55,66], 2),
(444, [1,2,3,4,5], 0),
(333, [0,11,22,33], -1),
('a', [5,6,7], -2)]
def SfromTuple(row):
name, data, shift = row
return pd.Series(data, index = arange(shift, len(data) + shift))
reindexed = pd.concat([SfromTuple(row) for row in mytup], axis=1)
reindexed.columns = [x[0] for x in mytup]
print(reindexed)
result from original mytup
: 来自原始mytup
结果:
0 111 222 333 -1 NaN NaN 0 0 1 NaN 11 1 2 NaN 22 2 3 33 33 3 4 44 NaN 4 5 55 NaN 5 NaN 66 NaN
from the longer mytup
added above: 从上面添加的更长的mytup
:
0 111 222 444 333 a -2 NaN NaN NaN NaN 5 -1 NaN NaN NaN 0 6 0 1 NaN 1 11 7 1 2 NaN 2 22 NaN 2 3 33 3 33 NaN 3 4 44 4 NaN NaN 4 5 55 5 NaN NaN 5 NaN 66 NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.