從幾個numpy系列創建pandas數據幀

Question

我正在嘗試創建一個pandas數據框，其中列是numpy數組。 我還想在創建時命名列。

這似乎是一項非常簡單的任務。

雖然列的順序錯誤，但它沒有命名列也可以正常工作：

import numpy as np
import pandas as pd

n_obs = 500

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80)) 

print(df.head())

輸出：

49  3.802458
57  3.830600
29  4.991442
47  2.600079
70  1.658041
52  2.236296
37  3.327520
23  1.366954
22  1.509165
36  1.289901
77  3.834789
68  4.370223
40  4.532152
71  2.348842

當我嘗試命名列時，我收到一個錯誤：

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'])

輸出：

Traceback (most recent call last):
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4622, in create_block_manager_from_blocks
    placement=slice(0, len(axes[0])))]
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 2957, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 120, in __init__
    len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "fake.py", line 33, in <module>
    df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) ,
 np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'
])
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 361, in __init__
    copy=copy)
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 533, in _init_ndarray
    return create_block_manager_from_blocks([values], [columns, index])
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4631, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4608, in construction_error
    passed, implied))
ValueError: Shape of passed values is (1, 500), indices imply (2, 500)

我找不到涵蓋這個的教程。 這顯然是一個非常簡單的問題，但我找不到解決方案。

Answer 1

使用dict將數組傳遞給DataFrame構造函數：

n_obs = 500

a = np.random.uniform(low = 1.1, high = 5.0,size = (n_obs))
b = np.random.randint(size = (n_obs), low = 18, high = 80)

df = pd.DataFrame({'col1':a, 'col2':b}) 
print (df.head())
       col1  col2
0  2.070148    23
1  1.735960    28
2  4.156209    72
3  4.253241    26
4  3.539951    45

如果可以使用python bellow 3.6添加參數columns以指定排序（從Python 3.6開始，標准dict類型默認維護插入順序）：

df = pd.DataFrame({'col1':a, 'col2':b}, columns=['col2','col1']) 
print (df.head())
   col2      col1
0    23  2.070148
1    28  1.735960
2    72  4.156209
3    26  4.253241
4    45  3.539951

您也可以在numpy中堆疊數組，但獲取相同類型的數據 - 這里浮點數：

df = pd.DataFrame(np.column_stack((a,b)), columns=['col1','col2']) 
print (df.head())
       col1  col2
0  2.070148  23.0
1  1.735960  28.0
2  4.156209  72.0
3  4.253241  26.0
4  3.539951  45.0

在你的解決方案中：

df = pd.DataFrame(a, b)

第一個數組創建列和第二個索引，它就像：

df = pd.DataFrame(a, index=b) 
print (df.head())
           0
23  2.070148
28  1.735960
72  4.156209
26  4.253241
45  3.539951

Answer 2

`pd.concat` + `pd.Series`

你可以轉換為系列和連接：

np.random.seed(0)

n_obs = 500
a = np.random.uniform(low=1.1, high=5.0, size=n_obs)
b = np.random.randint(size=n_obs, low=18, high=80)

df = pd.concat(map(pd.Series, (a, b)), axis=1, keys=['a', 'b'])

print(df.head())

          a   b
0  3.240373  57
1  3.889239  60
2  3.450777  77
3  3.225044  46
4  2.752254  42

Answer 3

看一看：

n_obs = 500
df = pd.DataFrame([np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , 
                  np.random.randint(size = (n_obs), low = 18, high = 80)]).T
df.columns = ['col1','col2']

從幾個numpy系列創建pandas數據幀

問題描述

3 個解決方案

解決方案1
4 已采納 2018-11-29 11:11:30

解決方案2
2 2018-11-29 11:22:37

`pd.concat` + `pd.Series`

解決方案3
1 2018-11-29 11:07:41

從幾個numpy系列創建pandas數據幀

問題描述

3 個解決方案

解決方案1 4 已采納 2018-11-29 11:11:30

解決方案2 2 2018-11-29 11:22:37

pd.concat + pd.Series

解決方案3 1 2018-11-29 11:07:41

解決方案1
4 已采納 2018-11-29 11:11:30

解決方案2
2 2018-11-29 11:22:37

`pd.concat` + `pd.Series`

解決方案3
1 2018-11-29 11:07:41