简体   繁体   English

在循环中创建多行 pandas.DataFrames 并附加到列表

[英]Creating multirow pandas.DataFrames in Loop and append to list

I'm very confused about this following behaviour: I have a loop, which creates simulated data based on a pandas.DataFrame .我对以下行为感到非常困惑:我有一个循环,它基于pandas.DataFrame创建模拟数据。 The output of each iteration is a new pandas.DataFrame with new columns ( x2 in the example below).每次迭代的输出是一个带有新列的新pandas.DataFrame (在下面的示例中为x2 )。

import pandas as pd
import random
mydf = pd.DataFrame({"x":[0]*2})

def addrand(x):
    return(x+random.normalvariate(0,1))

mysimulation = []
mycontrol = []
for i in range(0,5):
    mydf["x2"] = mydf["x"].apply(addrand)
    mydf["i"] = i
    mycontrol.append(i)
    mysimulation.append(mydf)
    
pd.concat(mysimulation)
#>    x        x2  i
0  0  1.023330  4
1  0 -0.428686  4
0  0  1.023330  4
1  0 -0.428686  4
0  0  1.023330  4
1  0 -0.428686  4
0  0  1.023330  4
1  0 -0.428686  4
0  0  1.023330  4
1  0 -0.428686  4

Created on 2020-09-08 by the reprexpy packagereprexpy 包于 2020 年 9 月 8 日创建

What confuses me is: While the resulting list of pandas.DataFrames holds the expected amount of DataFrames rows (2 x 5 = 10), they are simply 5 copies of the last iteration.让我感到困惑的是:虽然生成的pandas.DataFrames列表包含预期的 DataFrames 行数 (2 x 5 = 10),但它们只是上次迭代的 5 个副本。 This is clearly visible from the id column.id列可以清楚地看到这一点。 It should hold the numbers 0 to 4, but only contains the number 4. While on the hand, the list mycontrol behaves as expected and holds the numbers 0 to 4.它应该包含数字 0 到 4,但只包含数字 4。虽然在手上,列表mycontrol行为符合预期并包含数字 0 到 4。

Why does this happen?为什么会发生这种情况? And how can I resolve this?我该如何解决这个问题?

  • As you can see from the output below, mydf is updated with each iteration and added to mysimulation .从下面的输出中可以看出,每次迭代都会更新mydf并添加到mysimulation
  • However, with each iteration, you're doing an inplace update to mydf , and each mydf inside of mysimulation , is just a pointer, not a copy.然而,在每次迭代,你正在做的就地更新mydf ,每个mydf的内部mysimulation ,只是一个指针,而不是一个副本。
  • The issue can be resolved by adding .copy() , like mysimulation.append(mydf.copy())这个问题可以通过添加.copy()来解决,比如mysimulation.append(mydf.copy())
import random
import pandas as pd

random.seed(365)
def addrand(x):
    return(x+random.normalvariate(0,1))


mysimulation = []
mycontrol = []
display(mydf)  # display works in a jupyter notebook, otherwise use print
print('\n')
for i in range(0,5):
    print(i)

    mydf["x2"] = mydf["x"].apply(addrand)
    mydf["i"] = i
    display(mydf)

    mycontrol.append(i)
    mysimulation.append(mydf)
    display(mysimulation)
    print('\n')

在此处输入图片说明

With update有更新

  • mysimulation.append(mydf) to mysimulation.append(mydf.copy()) mysimulation.append(mydf)mysimulation.append(mydf.copy())
pd.concat(mysimulation)

[out]:
   x       x2  i
0  0  0.63265  0
1  0 -0.85868  0
0  0 -0.43199  1
1  0 -1.49446  1
0  0  0.23422  2
1  0 -0.74176  2
0  0  0.20195  3
1  0  1.61356  3
0  0  0.72138  4
1  0 -0.62529  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM