如何将 pandas dataframe 转换为具有列名的 numpy 数组？

Question

How can I convert pandas DataFrame into the following Numpy array with column names?如何将 pandas DataFrame转换为以下具有列名的 Numpy 数组？

array([('Heidi Mitchell', 'uboyd@hotmail.com', 74, 52, 'female', '1121', 'cancer', '03/06/2018'),
       ('Kimberly Kent', 'wilsoncarla@mitchell-gree', 63, 51, 'male', '2003', 'cancer', '16/06/2017')],
      dtype=[('name', '<U16'), ('email', '<U25'), ('age', '<i4'), ('weight', '<i4'), ('gender', '<U10'), ('zipcode', '<U6'), ('diagnosis', '<U6'), ('dob', '<U16')])

This is my pandas DataFrame df :这是我的 pandas DataFrame df ：

I tried to convert it as follows:我尝试将其转换如下：

import numpy as np

dt = np.dtype([('col1', np.int32), ('col2', np.int32)])
arr = np.array(df.values, dtype=dt)

But it gives me the output as follows:但它给了我 output 如下：

array([[(3, 5), (3, 1)],
      ...
      dtype=[('col1', '<i4'), ('col2', '<i4')])

For some reason, the rows of data are grouped [(3, 5), (3, 1)] instead of [(3, 5), (3, 1), (4, 5), (1, 5), (1, 2)] .由于某种原因，数据行被分组为[(3, 5), (3, 1)]而不是[(3, 5), (3, 1), (4, 5), (1, 5), (1, 2)] 。

Answer 1

Use the pandas function to_records() , which converts a dataframe to a numpy record array.使用 pandas function to_records() ，它将 dataframe 转换为 Z2EA14541 数组。 the link is the following: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_records.html链接如下： https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_records.html

Some examples given in the website are the following:网站中给出的一些示例如下：

>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
                       index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:索引可以从记录数组中排除：

>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])

Answer 2

You can use df.to_records(index=False) to convert the dataframe to a structured array:您可以使用df.to_records(index=False)将 dataframe 转换为结构化数组：

import pandas as pd
data = [ { "col1": 3, "col2": 5 }, { "col1": 3, "col2": 1 }, { "col1": 4, "col2": 5 }, { "col1": 1, "col2": 5 }, { "col1": 2, "col2": 2 } ]
df = pd.DataFrame(data)
df.to_records(index=False)

Output: Output：

rec.array([(3, 5), (3, 1), (4, 5), (1, 5), (2, 2)],
          dtype=[('col1', '<i8'), ('col2', '<i8')])

如何将 pandas dataframe 转换为具有列名的 numpy 数组？

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-04-03 18:23:47

解决方案2
1 2021-04-03 18:24:03

如何将 pandas dataframe 转换为具有列名的 numpy 数组？

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-04-03 18:23:47

解决方案2 1 2021-04-03 18:24:03

解决方案1
1 已采纳 2021-04-03 18:23:47

解决方案2
1 2021-04-03 18:24:03