[英]Creating pandas dataframes from several numpy series
我正在嘗試創建一個pandas數據框,其中列是numpy數組。 我還想在創建時命名列。
這似乎是一項非常簡單的任務。
雖然列的順序錯誤,但它沒有命名列也可以正常工作:
import numpy as np
import pandas as pd
n_obs = 500
df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80))
print(df.head())
輸出:
49 3.802458
57 3.830600
29 4.991442
47 2.600079
70 1.658041
52 2.236296
37 3.327520
23 1.366954
22 1.509165
36 1.289901
77 3.834789
68 4.370223
40 4.532152
71 2.348842
當我嘗試命名列時,我收到一個錯誤:
df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'])
輸出:
Traceback (most recent call last):
File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4622, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 2957, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 120, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "fake.py", line 33, in <module>
df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) ,
np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'
])
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 361, in __init__
copy=copy)
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 533, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4631, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4608, in construction_error
passed, implied))
ValueError: Shape of passed values is (1, 500), indices imply (2, 500)
我找不到涵蓋這個的教程。 這顯然是一個非常簡單的問題,但我找不到解決方案。
使用dict將數組傳遞給DataFrame
構造函數:
n_obs = 500
a = np.random.uniform(low = 1.1, high = 5.0,size = (n_obs))
b = np.random.randint(size = (n_obs), low = 18, high = 80)
df = pd.DataFrame({'col1':a, 'col2':b})
print (df.head())
col1 col2
0 2.070148 23
1 1.735960 28
2 4.156209 72
3 4.253241 26
4 3.539951 45
如果可以使用python bellow 3.6添加參數columns
以指定排序(從Python 3.6開始,標准dict類型默認維護插入順序):
df = pd.DataFrame({'col1':a, 'col2':b}, columns=['col2','col1'])
print (df.head())
col2 col1
0 23 2.070148
1 28 1.735960
2 72 4.156209
3 26 4.253241
4 45 3.539951
您也可以在numpy中堆疊數組,但獲取相同類型的數據 - 這里浮點數:
df = pd.DataFrame(np.column_stack((a,b)), columns=['col1','col2'])
print (df.head())
col1 col2
0 2.070148 23.0
1 1.735960 28.0
2 4.156209 72.0
3 4.253241 26.0
4 3.539951 45.0
在你的解決方案中:
df = pd.DataFrame(a, b)
第一個數組創建列和第二個索引,它就像:
df = pd.DataFrame(a, index=b)
print (df.head())
0
23 2.070148
28 1.735960
72 4.156209
26 4.253241
45 3.539951
pd.concat
+ pd.Series
你可以轉換為系列和連接:
np.random.seed(0)
n_obs = 500
a = np.random.uniform(low=1.1, high=5.0, size=n_obs)
b = np.random.randint(size=n_obs, low=18, high=80)
df = pd.concat(map(pd.Series, (a, b)), axis=1, keys=['a', 'b'])
print(df.head())
a b
0 3.240373 57
1 3.889239 60
2 3.450777 77
3 3.225044 46
4 2.752254 42
看一看:
n_obs = 500
df = pd.DataFrame([np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) ,
np.random.randint(size = (n_obs), low = 18, high = 80)]).T
df.columns = ['col1','col2']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.