[英]Creating multirow pandas.DataFrames in Loop and append to list
I'm very confused about this following behaviour: I have a loop, which creates simulated data based on a pandas.DataFrame
.我对以下行为感到非常困惑:我有一个循环,它基于
pandas.DataFrame
创建模拟数据。 The output of each iteration is a new pandas.DataFrame
with new columns ( x2
in the example below).每次迭代的输出是一个带有新列的新
pandas.DataFrame
(在下面的示例中为x2
)。
import pandas as pd
import random
mydf = pd.DataFrame({"x":[0]*2})
def addrand(x):
return(x+random.normalvariate(0,1))
mysimulation = []
mycontrol = []
for i in range(0,5):
mydf["x2"] = mydf["x"].apply(addrand)
mydf["i"] = i
mycontrol.append(i)
mysimulation.append(mydf)
pd.concat(mysimulation)
#> x x2 i
0 0 1.023330 4
1 0 -0.428686 4
0 0 1.023330 4
1 0 -0.428686 4
0 0 1.023330 4
1 0 -0.428686 4
0 0 1.023330 4
1 0 -0.428686 4
0 0 1.023330 4
1 0 -0.428686 4
Created on 2020-09-08 by the reprexpy package由reprexpy 包于 2020 年 9 月 8 日创建
What confuses me is: While the resulting list of pandas.DataFrames
holds the expected amount of DataFrames rows (2 x 5 = 10), they are simply 5 copies of the last iteration.让我感到困惑的是:虽然生成的
pandas.DataFrames
列表包含预期的 DataFrames 行数 (2 x 5 = 10),但它们只是上次迭代的 5 个副本。 This is clearly visible from the id
column.从
id
列可以清楚地看到这一点。 It should hold the numbers 0 to 4, but only contains the number 4. While on the hand, the list mycontrol
behaves as expected and holds the numbers 0 to 4.它应该包含数字 0 到 4,但只包含数字 4。虽然在手上,列表
mycontrol
行为符合预期并包含数字 0 到 4。
Why does this happen?为什么会发生这种情况? And how can I resolve this?
我该如何解决这个问题?
mydf
is updated with each iteration and added to mysimulation
.mydf
并添加到mysimulation
。mydf
, and each mydf
inside of mysimulation
, is just a pointer, not a copy.mydf
,每个mydf
的内部mysimulation
,只是一个指针,而不是一个副本。.copy()
, like mysimulation.append(mydf.copy())
.copy()
来解决,比如mysimulation.append(mydf.copy())
import random
import pandas as pd
random.seed(365)
def addrand(x):
return(x+random.normalvariate(0,1))
mysimulation = []
mycontrol = []
display(mydf) # display works in a jupyter notebook, otherwise use print
print('\n')
for i in range(0,5):
print(i)
mydf["x2"] = mydf["x"].apply(addrand)
mydf["i"] = i
display(mydf)
mycontrol.append(i)
mysimulation.append(mydf)
display(mysimulation)
print('\n')
mysimulation.append(mydf)
to mysimulation.append(mydf.copy())
mysimulation.append(mydf)
到mysimulation.append(mydf.copy())
pd.concat(mysimulation)
[out]:
x x2 i
0 0 0.63265 0
1 0 -0.85868 0
0 0 -0.43199 1
1 0 -1.49446 1
0 0 0.23422 2
1 0 -0.74176 2
0 0 0.20195 3
1 0 1.61356 3
0 0 0.72138 4
1 0 -0.62529 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.