简体   繁体   English

包含 numpy ndarrays 各种形状的 Pandas 行

[英]Pandas rows containing numpy ndarrays various shapes

I'd creating a Pandas DataFrame in which each particular (index, column) location can be a numpy ndarray of arbitrary shape, or even a simple number.我将创建一个 Pandas DataFrame,其中每个特定(index, column)位置可以是任意形状的 numpy ndarray,甚至是一个简单的数字。

This works:这有效:

import numpy as np, pandas as pd
x = pd.DataFrame([[np.random.rand(100, 100, 20, 2), 3], [2, 2], [3, 3], [4, 4]],
                              index=['A1', 'B2', 'C3', 'D4'], columns=['data', 'data2'])
print(x)

but takes 50 seconds to create on my computer!但在我的电脑上创建需要 50 秒! Why?为什么?

np.random.rand(100, 100, 20, 2) alone is super fast (< 1 second to create) np.random.rand(100, 100, 20, 2)单独是超快的(< 1 秒创建)

How to speed up the creation of Pandas datasets containing ndarrays of various shapes?如何加快创建包含各种形状的 ndarray 的 Pandas 数据集?

It's not actually the creation that is the issue, it's the print statement.实际上,问题不是创作,而是print声明。 1000 loops of the creation take 2.8 seconds on my computer.在我的电脑上创建 1000 个循环需要 2.8 秒。 But one iteration of the print takes about 26 seconds.但是print一次迭代大约需要 26 秒。

Interestingly, print(x['data2']) , print(x['data']['A1']) and print(x['data']['B2']) are all basically instantaneous.有趣的是, print(x['data2'])print(x['data']['A1'])print(x['data']['B2'])基本上都是瞬时的。 So it seems print is having an issue figuring out how to display items of vastly different size.因此, print似乎在弄清楚如何显示大小差异很大的项目时遇到了问题。 Perhaps a bug?也许是一个错误?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM