[英]Python: Create structured numpy structured array from two columns in a DataFrame
How do you create a structured array from two columns in a DataFrame? 如何从DataFrame中的两列创建结构化数组? I tried this:
我试过这个:
df = pd.DataFrame(data=[[1,2],[10,20]], columns=['a','b'])
df
a b
0 1 2
1 10 20
x = np.array([([val for val in list(df['a'])],
[val for val in list(df['b'])])])
But this gives me this: 但这给了我这个:
array([[[ 1, 10],
[ 2, 20]]])
But I wanted this: 但我想要这个:
[(1,2),(10,20)]
Thanks! 谢谢!
There are a couple of methods. 有几种方法。 You may experience a loss in performance and functionality relative to regular NumPy arrays.
相对于常规NumPy阵列,您可能会遇到性能和功能损失。
You can use pd.DataFrame.to_records
with index=False
. 您可以使用
index=False
pd.DataFrame.to_records
。 Technically, this is a record array , but for many purposes this will be sufficient. 从技术上讲,这是一个记录阵列 ,但出于许多目的,这就足够了。
res1 = df.to_records(index=False)
print(res1)
rec.array([(1, 2), (10, 20)],
dtype=[('a', '<i8'), ('b', '<i8')])
Manually, you can construct a structured array via conversion to tuple
by row, then specifying a list of tuples for the dtype
parameter. 手动,您可以通过逐行转换为
tuple
来构造结构化数组,然后为dtype
参数指定元组列表。
s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))
print(res2)
array([(1, 2), (10, 20)],
dtype=[('a', '<i8'), ('b', '<i8')])
What's the difference? 有什么不同?
Very little. 很少。
recarray
is a subclass of ndarray
, the regular NumPy array type. recarray
是的子类ndarray
,常规NumPy的阵列型。 On the other hand, the structured array in the second example is of type ndarray
. 另一方面,第二个例子中的结构化数组是
ndarray
类型。
type(res1) # numpy.recarray
isinstance(res1, np.ndarray) # True
type(res2) # numpy.ndarray
The main difference is record arrays facilitate attribute lookup, while structured arrays will yield AttributeError
: 主要区别是记录数组有助于属性查找,而结构化数组将产生
AttributeError
:
print(res1.a)
array([ 1, 10], dtype=int64)
print(res2.a)
AttributeError: 'numpy.ndarray' object has no attribute 'a'
Related: NumPy “record array” or “structured array” or “recarray” 相关: NumPy“记录数组”或“结构化数组”或“重新排列”
Use list comprehension for convert nested list
s to tuple
s: 使用list comprehension将嵌套
list
转换为tuple
:
print ([tuple(x) for x in df.values.tolist()])
[(1, 2), (10, 20)]
Detail : 细节 :
print (df.values.tolist())
[[1, 2], [10, 20]]
EDIT: You can convert by to_records
and then to np.asarray
, check link : 编辑:你可以转换为
to_records
然后转换为np.asarray
,检查链接 :
df = pd.DataFrame(data=[[True, 1,2],[False, 10,20]], columns=['a','b','c'])
print (df)
a b c
0 True 1 2
1 False 10 20
print (np.asarray(df.to_records(index=False)))
[( True, 1, 2) (False, 10, 20)]
Here's a one-liner: 这是一个单行:
list(df.apply(lambda x: tuple(x), axis=1))
or 要么
df.apply(lambda x: tuple(x), axis=1).values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.