Python：从DataFrame中的两列创建结构化的numpy结构化数组

Question

How do you create a structured array from two columns in a DataFrame? 如何从DataFrame中的两列创建结构化数组？ I tried this: 我试过这个：

df = pd.DataFrame(data=[[1,2],[10,20]], columns=['a','b'])
df

    a   b
0   1   2
1   10  20

x = np.array([([val for val in list(df['a'])],
               [val for val in list(df['b'])])])

But this gives me this: 但这给了我这个：

array([[[ 1, 10],
        [ 2, 20]]])

But I wanted this: 但我想要这个：

[(1,2),(10,20)]

Thanks! 谢谢！

Answer 1

There are a couple of methods. 有几种方法。 You may experience a loss in performance and functionality relative to regular NumPy arrays. 相对于常规NumPy阵列，您可能会遇到性能和功能损失。

record array 记录数组

You can use pd.DataFrame.to_records with index=False . 您可以使用index=False pd.DataFrame.to_records 。 Technically, this is a record array , but for many purposes this will be sufficient. 从技术上讲，这是一个记录阵列，但出于许多目的，这就足够了。

res1 = df.to_records(index=False)

print(res1)

rec.array([(1, 2), (10, 20)], 
          dtype=[('a', '<i8'), ('b', '<i8')])

structured array 结构化数组

Manually, you can construct a structured array via conversion to tuple by row, then specifying a list of tuples for the dtype parameter. 手动，您可以通过逐行转换为tuple来构造结构化数组，然后为dtype参数指定元组列表。

s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))

print(res2)

array([(1, 2), (10, 20)], 
      dtype=[('a', '<i8'), ('b', '<i8')])

What's the difference? 有什么不同？

Very little. 很少。 recarray is a subclass of ndarray , the regular NumPy array type. recarray是的子类ndarray ，常规NumPy的阵列型。 On the other hand, the structured array in the second example is of type ndarray . 另一方面，第二个例子中的结构化数组是ndarray类型。

type(res1)                    # numpy.recarray
isinstance(res1, np.ndarray)  # True
type(res2)                    # numpy.ndarray

The main difference is record arrays facilitate attribute lookup, while structured arrays will yield AttributeError : 主要区别是记录数组有助于属性查找，而结构化数组将产生AttributeError ：

print(res1.a)
array([ 1, 10], dtype=int64)

print(res2.a)
AttributeError: 'numpy.ndarray' object has no attribute 'a'

Related: NumPy “record array” or “structured array” or “recarray” 相关： NumPy“记录数组”或“结构化数组”或“重新排列”

Answer 2

Use list comprehension for convert nested list s to tuple s: 使用list comprehension将嵌套list转换为tuple ：

print ([tuple(x) for x in df.values.tolist()])
[(1, 2), (10, 20)]

Detail : 细节：

print (df.values.tolist())
[[1, 2], [10, 20]]

EDIT: You can convert by to_records and then to np.asarray , check link : 编辑：你可以转换为to_records然后转换为np.asarray ，检查链接：

df = pd.DataFrame(data=[[True, 1,2],[False, 10,20]], columns=['a','b','c'])
print (df)
       a   b   c
0   True   1   2
1  False  10  20

print (np.asarray(df.to_records(index=False)))
[( True,  1,  2) (False, 10, 20)]

Answer 3

Here's a one-liner: 这是一个单行：

list(df.apply(lambda x: tuple(x), axis=1))

or 要么

df.apply(lambda x: tuple(x), axis=1).values

Python：从DataFrame中的两列创建结构化的numpy结构化数组

问题描述

3 个解决方案

解决方案1
4 2018-07-11 08:23:18

record array 记录数组

structured array 结构化数组

解决方案2
1 2018-07-11 07:51:12

解决方案3
0 2018-07-11 08:07:02

Python：从DataFrame中的两列创建结构化的numpy结构化数组

问题描述

3 个解决方案

解决方案1 4 2018-07-11 08:23:18

record array 记录数组

structured array 结构化数组

解决方案2 1 2018-07-11 07:51:12

解决方案3 0 2018-07-11 08:07:02

解决方案1
4 2018-07-11 08:23:18

解决方案2
1 2018-07-11 07:51:12

解决方案3
0 2018-07-11 08:07:02