Numpy 数组作为 Pandas 中的一个元素 DataFrame

Question

How do I put a numpy array into an element (single cell) of a Pandas DataFrame?如何将 numpy 数组放入 Pandas DataFrame 的元素（单个单元格）中？ For instance,例如，

Driver  Make  Model  Coordinates
Bob     Ford  Focus  [[1, 0, 1],[1, 2, 3], [2, 0, 2]]
Sally   Ford  Echo   [[0, 0, 1],[0, 2, 0]]

I've tried to store the array on each row, but the documentation doesn't seem to support this.我试图将数组存储在每一行上，但文档似乎不支持这一点。

Context:语境：

I am hoping to use df.to_json() to export the data to a json file, from which the data can later be read into a DataFrame where each row is one of the individuals.我希望使用df.to_json()将数据导出到 json 文件，稍后可以从该文件中将数据读入 DataFrame，其中每一行都是个人之一。 Should I be thinking about doing this differently?我应该考虑以不同的方式做这件事吗？

Answer 1

Yes, you can.是的你可以。 Use .at[] or .iat[] to avoid broadcasting behavior when attempting to put an iterable into a single cell.在尝试将可迭代对象放入单个单元格时，使用 .at[ .at[]或.iat[]来避免广播行为。 This also applies to list and set .这也适用于list和set 。

The bad thing: It may be quite challenging to do such assignment in an optimized way that does not involve iteration through rows.坏处：以不涉及遍历行的优化方式进行此类分配可能非常具有挑战性。 That said, this is still doable for reasonably-sized arrays. And if you really have to store millions of such arrays, a fundamental redesign may be required.也就是说，对于大小合理的 arrays，这仍然是可行的。如果您真的必须存储数百万个这样的 arrays，则可能需要进行根本性的重新设计。 Eg restructure your code, use MongoDB or other storage instruments instead, etc.例如，重组您的代码，改用 MongoDB 或其他存储工具等。

import pandas as pd
import numpy as np

# preallocate the output dataframe
df = pd.DataFrame(
    data=np.zeros((2,4), dtype=object),
    columns=["Driver", "Make", "Model", "Coordinates"]
)

# element-wise assignment
df.at[0, "Coordinates"] = np.array([[1, 0, 1],[1, 2, 3], [2, 0, 2]])
df.at[1, "Coordinates"] = np.array([[0, 0, 1],[0, 2, 0]])
# other elements were omitted

Result结果

print(df)
  Driver Make Model                        Coordinates
0      0    0     0  [[1, 0, 1], [1, 2, 3], [2, 0, 2]]
1      0    0     0             [[0, 0, 1], [0, 2, 0]]

print(df.at[0, "Coordinates"])
[[1 0 1]
 [1 2 3]
 [2 0 2]]

print(type(df.at[0, "Coordinates"]))
<class 'numpy.ndarray'>

Numpy 数组作为 Pandas 中的一个元素 DataFrame

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-10-25 17:20:14

Numpy 数组作为 Pandas 中的一个元素 DataFrame

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-10-25 17:20:14

解决方案1
2 已采纳 2020-10-25 17:20:14