[英]Filling empty DataFrame with numpy structured array
I created an empty DataFrame
by doing the following: 我通过执行以下操作创建了一个空的DataFrame
:
In [581]: df=pd.DataFrame(np.empty(8,dtype=([('f0', '<i8'), ('f1', '<f8'),('f2', '<i8'), ('f3', '<f8'),('f4', '<f8'),('f5', '<f8'), ('f6', '<f8'),('f7', '<f8')])))
In [582]: df
Out[582]:
f0 f1 f2 f3 f4 \
0 3714580581 2.448187e-316 3928263553 2.447690e-316 0.000000e+00
1 0 0.000000e+00 0 0.000000e+00 0.000000e+00
2 0 0.000000e+00 0 0.000000e+00 3.284339e-315
3 0 0.000000e+00 0 0.000000e+00 0.000000e+00
4 0 0.000000e+00 298532785 4.341609e-315 0.000000e+00
5 0 0.000000e+00 1178683509 2.448189e-316 0.000000e+00
6 0 0.000000e+00 0 0.000000e+00 7.659812e-315
7 0 0.000000e+00 4211786525 2.448192e-316 0.000000e+00
f5 f6 f7
0 0.000000e+00 0.000000e+00 0.000000e+00
1 0.000000e+00 0.000000e+00 0.000000e+00
2 2.447692e-316 9.702437e-315 2.448246e-316
3 0.000000e+00 0.000000e+00 0.000000e+00
4 0.000000e+00 0.000000e+00 0.000000e+00
5 0.000000e+00 0.000000e+00 0.000000e+00
6 4.341599e-315 0.000000e+00 0.000000e+00
7 0.000000e+00 0.000000e+00 0.000000e+00
Now i am trying to change the data of the first 4 rows using a numpy
structured array
: 现在,我尝试使用numpy
structured array
更改前4行的数据:
In [583]: x=np.ones(4,dtype=([('f0', '<i8'), ('f1', '<f8'),('f2', '<i8'), ('f3', '<f8'),('f4', '<f8'),('f5', '<f8'), ('f6', '<f8'),('f7', '<f8')]))
In [584]: x
Out[584]:
array([(1L, 1.0, 1L, 1.0, 1.0, 1.0, 1.0, 1.0),
(1L, 1.0, 1L, 1.0, 1.0, 1.0, 1.0, 1.0),
(1L, 1.0, 1L, 1.0, 1.0, 1.0, 1.0, 1.0),
(1L, 1.0, 1L, 1.0, 1.0, 1.0, 1.0, 1.0)],
dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<i8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8'), ('f7', '<f8')])
In [585]: df[0:4]=x
ValueError: Must have equal len keys and value when setting with an iterable
Is there a different way to accomplish this? 有没有其他方法可以做到这一点?
This would partially work if i filled the DataFrame
with a view of the structured array
: 如果我用structured array
的视图填充DataFrame
,这将部分起作用:
In [587]: df[0:4]=x.view(np.float64).reshape(x.shape + (-1,))
In [588]: df
Out[588]:
f0 f1 f2 f3 f4 f5 f6 f7
0 0 1.0 0 1.000000e+00 1.000000e+00 1.000000e+00 1.0 1.0
1 0 1.0 0 1.000000e+00 1.000000e+00 1.000000e+00 1.0 1.0
2 0 1.0 0 1.000000e+00 1.000000e+00 1.000000e+00 1.0 1.0
3 0 1.0 0 1.000000e+00 1.000000e+00 1.000000e+00 1.0 1.0
4 0 0.0 298532785 4.341609e-315 0.000000e+00 0.000000e+00 0.0 0.0
5 0 0.0 1178683509 2.448189e-316 0.000000e+00 0.000000e+00 0.0 0.0
6 0 0.0 0 0.000000e+00 7.659812e-315 4.341599e-315 0.0 0.0
7 0 0.0 4211786525 2.448192e-316 0.000000e+00 0.000000e+00 0.0 0.0
But as you can see the f0
and f2
columns are now 0 since the integer 1 was coerced to a float. 但是正如您所看到的,由于整数1被强制转换为浮点数,因此f0
和f2
列现在为0。
The obvious solution is to give pandas a pandas dataframe: 显而易见的解决方案是给pandas一个pandas数据框:
df[0:4] = pd.DataFrame(x)
This is very performance heavy, but in your example it is probably not noticeable. 这是非常繁重的性能,但是在您的示例中,它可能并不明显。
I would suggest you use the .iloc
method as it is more explicit. 我建议您使用.iloc
方法,因为它更加明确。
df.iloc[0:4] = pd.DataFrame(x)
Of course, the performance drop comes from instanciating a new object, the pandas DataFrame, so this has the same performance flaw. 当然,性能下降来自于实例化一个新对象pandas DataFrame,因此这也存在相同的性能缺陷。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.