包含 numpy ndarrays 各种形状的 Pandas 行

Question

I'd creating a Pandas DataFrame in which each particular (index, column) location can be a numpy ndarray of arbitrary shape, or even a simple number.我将创建一个 Pandas DataFrame，其中每个特定(index, column)位置可以是任意形状的 numpy ndarray，甚至是一个简单的数字。

This works:这有效：

import numpy as np, pandas as pd
x = pd.DataFrame([[np.random.rand(100, 100, 20, 2), 3], [2, 2], [3, 3], [4, 4]],
                              index=['A1', 'B2', 'C3', 'D4'], columns=['data', 'data2'])
print(x)

but takes 50 seconds to create on my computer!但在我的电脑上创建需要 50 秒！ Why?为什么？

np.random.rand(100, 100, 20, 2) alone is super fast (< 1 second to create) np.random.rand(100, 100, 20, 2)单独是超快的（< 1 秒创建）

How to speed up the creation of Pandas datasets containing ndarrays of various shapes?如何加快创建包含各种形状的 ndarray 的 Pandas 数据集？

Answer 1

It's not actually the creation that is the issue, it's the print statement.实际上，问题不是创作，而是print声明。 1000 loops of the creation take 2.8 seconds on my computer.在我的电脑上创建 1000 个循环需要 2.8 秒。 But one iteration of the print takes about 26 seconds.但是print一次迭代大约需要 26 秒。

Interestingly, print(x['data2']) , print(x['data']['A1']) and print(x['data']['B2']) are all basically instantaneous.有趣的是， print(x['data2']) 、 print(x['data']['A1'])和print(x['data']['B2'])基本上都是瞬时的。 So it seems print is having an issue figuring out how to display items of vastly different size.因此， print似乎在弄清楚如何显示大小差异很大的项目时遇到了问题。 Perhaps a bug?也许是一个错误？

包含 numpy ndarrays 各种形状的 Pandas 行

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-06-23 23:48:23

包含 numpy ndarrays 各种形状的 Pandas 行

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-06-23 23:48:23

解决方案1
2 已采纳 2022-06-23 23:48:23