简体   繁体   English

Python熊猫通过遍历列来构建数据框

[英]Python pandas constructing dataframe by looping over columns

I am trying to develop a new panda dataframe based on data I got from an existing dataframe and then taking into account the previously calculated value in the new dataframe. 我正在尝试根据从现有数据框中获得的数据来开发新的熊猫数据框,然后考虑新数据框中先前计算的值。

As an example, here are two dataframes with the same size. 例如,这是两个大小相同的数据帧。

df1 = pd.DataFrame(np.random.randint(0,10, size = (5, 4)), columns=['1', '2', '3', '4'])
df2 = pd.DataFrame(np.zeros(df1.shape), index=df1.index, columns=df1.columns)

Then I created a list which starts as a starting basis for my second dataframe df2 然后,我创建了一个列表,该列表作为第二个数据帧df2的开始基础

L = [2,5,6,7]

df2.loc[0] = L

Then for the remaining rows of df2 I want to take the value from the previous time step (df2) and add the value of df1. 然后,对于df2的其余行,我想取上一个时间步(df2)的值,然后加上df1的值。

for i in df2.loc[1:]:
   df2.ix[i] = df2.ix[i-1] + df1

As an example my dataframes should look like this: 例如,我的数据框应如下所示:

>>> df1
   1  2  3  4
0  4  6  0  6
1  7  0  7  9
2  9  1  9  9
3  5  2  3  6
4  0  3  2  9
>>> df2
   1  2  3  4
0  2  5  6  7
1  9  5  13 16
2  18 6  22 25
3  23 8  25 31
4  23 11 27 40

I know there is something wrong with the indication of indexes in the for loop but I cannot figure out how the argument must be formulated. 我知道for循环中的索引指示出了点问题,但是我无法弄清楚必须如何构造该参数。 I would be very thankful for any help on this. 在此方面的任何帮助,我将非常感谢。

this is a simple cumsum . 这是一个简单的cumsum

df2 = df1.copy()
df2.loc[0] = [2,5,6,7]
desired_df = df2.cumsum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM