简体   繁体   English

如何将 Numpy 数组列表转换为 Pandas DataFrame

[英]How to convert a list of Numpy arrays to a Pandas DataFrame

I have a list of Numpy arrays that looks like this:我有一个 Numpy 数组列表,如下所示:

[400.31865662]
[401.18514808]
[404.84015554]
[405.14682194]
[405.67735105]
[273.90969447]
[274.0894528]

When I try to convert it to a Pandas Dataframe with the following code当我尝试使用以下代码将其转换为 Pandas Dataframe 时

y = pd.DataFrame(data)
print(y)

I get the following output when printing it.打印时我得到以下输出。 Why do I get all those zeros?为什么我得到所有这些零?

            0
0  400.318657
            0
0  401.185148
            0
0  404.840156
            0
0  405.146822
            0
0  405.677351
            0
0  273.909694
            0
0  274.089453

I would like to get a single column dataframe which looks like that:我想得到一个看起来像这样的单列数据框:

400.31865662
401.18514808
404.84015554
405.14682194
405.67735105
273.90969447
274.0894528

You could flatten the numpy array:你可以展平numpy 数组:

import numpy as np
import pandas as pd

data = [[400.31865662],
        [401.18514808],
        [404.84015554],
        [405.14682194],
        [405.67735105],
        [273.90969447],
        [274.0894528]]

arr = np.array(data)

df = pd.DataFrame(data=arr.flatten())

print(df)

Output输出

            0
0  400.318657
1  401.185148
2  404.840156
3  405.146822
4  405.677351
5  273.909694
6  274.089453

Since I assume the many visitors of this post aren't here for OP's specific and un-reproducible issue, here's a general answer :由于我假设这篇文章的许多访问者不是因为 OP 的特定且不可重现的问题而来到这里,所以这里有一个一般性的答案

df = pd.DataFrame(array)

The strength of pandas is to be nice for the eye (like Excel), so it's important to use column names. pandas的优点是美观(如 Excel),因此使用列名很重要。

import numpy as np
import pandas as pd

array = np.random.rand(5, 5)
array([[0.723, 0.177, 0.659, 0.573, 0.476],
       [0.77 , 0.311, 0.533, 0.415, 0.552],
       [0.349, 0.768, 0.859, 0.273, 0.425],
       [0.367, 0.601, 0.875, 0.109, 0.398],
       [0.452, 0.836, 0.31 , 0.727, 0.303]])
columns = [f'col_{num}' for num in range(5)]
index = [f'index_{num}' for num in range(5)]

Here's where the magic happens:这是神奇发生的地方:

df = pd.DataFrame(array, columns=columns, index=index)
            col_0     col_1     col_2     col_3     col_4
index_0  0.722791  0.177427  0.659204  0.572826  0.476485
index_1  0.770118  0.311444  0.532899  0.415371  0.551828
index_2  0.348923  0.768362  0.858841  0.273221  0.424684
index_3  0.366940  0.600784  0.875214  0.108818  0.397671
index_4  0.451682  0.836315  0.310480  0.727409  0.302597

I just figured out my mistake.我刚刚弄清楚我的错误。 (data) was a list of arrays: (data) 是一个数组列表:

[array([400.0290173]), array([400.02253235]), array([404.00252113]), array([403.99466754]), array([403.98681395]), array([271.97896036]), array([271.97110677])]

So I used np.vstack(data) to concatenate it所以我用np.vstack(data)来连接它

conc = np.vstack(data)

[[400.0290173 ]
 [400.02253235]
 [404.00252113]
 [403.99466754]
 [403.98681395]
 [271.97896036]
 [271.97110677]]

Then I convert the concatened array into a Pandas Dataframe by using the然后我使用

newdf = pd.DataFrame(conc)


    0
0  400.029017
1  400.022532
2  404.002521
3  403.994668
4  403.986814
5  271.978960
6  271.971107

Et voilà!瞧!

There is another way, which isn't mentioned in the other answers.还有一种方法,其他答案中没有提到。 If you have a NumPy array which is essentially a row vector (or column vector) ie shape like (n, ) , then you could do the following:如果你有一个 NumPy 数组,它本质上是一个行向量(或列向量),即形状像(n, ) ,那么你可以执行以下操作:

# sample array
x = np.zeros((20))
# empty dataframe
df = pd.DataFrame()
# add the array to df as a column
df['column_name'] = x

This way you can add multiple arrays as separate columns.这样您就可以将多个数组添加为单独的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM