[英]Converting key-values of python dictionary into pandas dataframe
我有一個Python字典,其中包含單個或多個整數值,例如:
d = {'a': ['1.20', '1', '1.10'], 'b': ['5.800', '1', '2.000'], 'c': ['9.5000', '0.9000'], 'h': ['1.90000', '6.100000'], 'l': ['1.0000', '8.00000'], 'o': '5.0000', 'p': ['3.00', '1.1000'], 'v': ['1.8', '0.0000']}
如何在不借助pandas系列的情況下將其轉換為pandas DataFrame ?
預期產量:
col1 col2 col3
a 1.2 1 1.1
b 5.8 1 2
c 9.5 0.9 NaN
h 1.9 6.1 NaN
l 1 8 NaN
o 5 NaN NaN
p 3 1.1 NaN
v 1.8 0 NaN
使用助手Series
:
df = pd.concat({k:pd.Series(v) for k, v in d.items()}).unstack().astype(float).sort_index()
df.columns = 'col1 col2 col3'.split()
另一種解決方案是不將列表值轉換為一個元素列表,然后將DataFrame.from_dict
:
d = {k:v if isinstance(v, list) else [v] for k, v in d.items()}
df = pd.DataFrame.from_dict(d, orient='index').astype(float).sort_index()
df.columns = 'col1 col2 col3'.split()
print (df)
col1 col2 col3
a 1.2 1.0 1.1
b 5.8 1.0 2.0
c 9.5 0.9 NaN
h 1.9 6.1 NaN
l 1.0 8.0 NaN
o 5.0 NaN NaN
p 3.0 1.1 NaN
v 1.8 0.0 NaN
這是一種方法:
from collections import OrderedDict
import pandas as pd, numpy as np
d = {'a': ['1.20', '1', '1.10'], 'b': ['5.800', '1', '2.000'],
'c': ['9.5000', '0.9000'], 'h': ['1.90000', '6.100000'],
'l': ['1.0000', '8.00000'], 'o': '5.0000', 'p': ['3.00', '1.1000'],
'v': ['1.8', '0.0000']}
# convert to numeric
for k, v in d.items():
lst = list(map(float, v)) if isinstance(v, list) else [float(v)]
lst += [np.nan] * (3 - len(lst))
d[k] = lst
# sort dictionary by key & create cols
d = OrderedDict(sorted(d.items()))
cols = list(zip(*d.values()))
# build dataframe
df = pd.DataFrame.from_dict(d).T
# 0 1 2
# a 1.2 1.0 1.1
# b 5.8 1.0 2.0
# c 9.5 0.9 NaN
# h 1.9 6.1 NaN
# l 1.0 8.0 NaN
# o 5.0 NaN NaN
# p 3.0 1.1 NaN
# v 1.8 0.0 NaN
嘗試
df = pd.Series(d).apply(pd.Series).rename(columns=lambda col: 'col{}'.format(col+1))
輸出將是
col1 col2 col3
a 1.20 1 1.10
b 5.800 1 2.000
c 9.5000 0.9000 NaN
h 1.90000 6.100000 NaN
l 1.0000 8.00000 NaN
o 5.0000 NaN NaN
p 3.00 1.1000 NaN
v 1.8 0.0000 NaN
沒有pd.Series
df = pd.DataFrame(list(map(lambda v: [v] if type(v)!=list else v,d.values())
),index=d.keys(),columns=['col{}'.format(col+1) for col in range(3)])
您可能還想先將dict的所有值填充到length-3數組
padded_d = {k : list(v) + [None] * (3 - len(v)) for k,v in d.items()}
然后使用.from_dict()
的pd.DataFrame()
>>> pd.DataFrame.from_dict(padded_d, orient="index")
0 1 2
a 1.20 1 1.10
b 5.800 1 2.000
c 9.5000 0.9000 None
h 1.90000 6.100000 None
l 1.0000 8.00000 None
p 3.00 1.1000 None
v 1.8 0.0000 None
為了處理輸入中鍵'o': '5.0000'
的格式錯誤的值'o': '5.0000'
(我們期望'o' : ['5.0000']
-不確定這是否是錯字),您應該檢查類型。雖然這可能會更清潔
def type_check(s):
if isinstance(s, str):
return [s]
else:
return s
padded_d = {k : type_check(v) + [None] * (3 - len(v)) for k,v in d.items()}
pd.DataFrame.from_dict(padded_d, orient="index")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.