[英]Convert Python Dictionary to Pandas Dataframe
我正在將python列表/字典轉換為pandas數據框:
import numpy as np
import pandas as pd
points = [
{'coords': (100.5, 100), 'class': 1},
{'coords': (300, 300), 'class':2},
{'coords': (50, 200), 'class':4},
{'coords': (550, 400), 'class':10},
{'coords': (550, 300), 'class':1}
]
# pandas data frame
data = np.array([['x', 'y', 'class']])
for point in points:
row = [point['coords'][0], point['coords'][1], point['class']]
data = np.vstack((data, row))
df = pd.DataFrame(data[1:])
df.columns = data[0:1].tolist()
這給出以下df:
xy class 0 100.5 100.0 1.0 1 300 300 2 2 50 200 4 3 550 400 10 4 550 300 1
但是,如果我現在嘗試進行如下計算:
df['mult'] = df['x'] * df['y']
我收到一個錯誤:
ValueError:傳遞的項目數錯誤2,展示位置暗含1
為什么會發生這種情況(所有列都有
object
dtype)?
在這行之后:
In [100]: data = np.array([['x', 'y', 'class']])
數組data
將具有object
(字符串)dtype:
In [101]: data.dtype
Out[101]: dtype('<U5')
在連接數值之后:
In [102]: data = np.vstack((data, (100.5, 100, 1)))
In [103]: data
Out[103]:
array([['x', 'y', 'class'],
['100.5', '100.0', '1.0']], dtype='<U32')
In [104]: data.dtype
Out[104]: dtype('<U32')
您只能按以下方式在data
和常量DF中收集數值:
df = pd.DataFrame(data, columns=['x', 'y', 'class'])
但是我會嘗試一種稍微不同的方法:
In [80]: df = pd.DataFrame(points)
In [81]: df[['x','y']] = df.pop('coords').apply(pd.Series)
In [82]: df
Out[82]:
class x y
0 1 100.5 100.0
1 2 300.0 300.0
2 4 50.0 200.0
3 10 550.0 400.0
4 1 550.0 300.0
In [83]: df['mult'] = df['x'] * df['y']
In [84]: df
Out[84]:
class x y mult
0 1 100.5 100.0 10050.0
1 2 300.0 300.0 90000.0
2 4 50.0 200.0 10000.0
3 10 550.0 400.0 220000.0
4 1 550.0 300.0 165000.0
您可以嘗試將此數據幀的dtype轉換為float並使用np.multiply函數。
import numpy as np
import pandas as pd
points = [
{'coords': (100.5, 100), 'class': 1},
{'coords': (300, 300), 'class':2},
{'coords': (50, 200), 'class':4},
{'coords': (550, 400), 'class':10},
{'coords': (550, 300), 'class':1}
]
# pandas data frame
data = np.array([['x', 'y', 'class']])
for point in points:
row = [point['coords'][0], point['coords'][1], point['class']]
data = np.vstack((data, row))
df = pd.DataFrame(data[1:],dtype=float)
df.columns = data[0:1].tolist()
df['mult'] = np.multiply(df['x'],df['y'])
df['mult']
mult
0 10050.0
1 90000.0
2 10000.0
3 220000.0
4 165000.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.