[英]Converting key-values of python dictionary into pandas dataframe
我有一个Python字典,其中包含单个或多个整数值,例如:
d = {'a': ['1.20', '1', '1.10'], 'b': ['5.800', '1', '2.000'], 'c': ['9.5000', '0.9000'], 'h': ['1.90000', '6.100000'], 'l': ['1.0000', '8.00000'], 'o': '5.0000', 'p': ['3.00', '1.1000'], 'v': ['1.8', '0.0000']}
如何在不借助pandas系列的情况下将其转换为pandas DataFrame ?
预期产量:
col1 col2 col3
a 1.2 1 1.1
b 5.8 1 2
c 9.5 0.9 NaN
h 1.9 6.1 NaN
l 1 8 NaN
o 5 NaN NaN
p 3 1.1 NaN
v 1.8 0 NaN
使用助手Series
:
df = pd.concat({k:pd.Series(v) for k, v in d.items()}).unstack().astype(float).sort_index()
df.columns = 'col1 col2 col3'.split()
另一种解决方案是不将列表值转换为一个元素列表,然后将DataFrame.from_dict
:
d = {k:v if isinstance(v, list) else [v] for k, v in d.items()}
df = pd.DataFrame.from_dict(d, orient='index').astype(float).sort_index()
df.columns = 'col1 col2 col3'.split()
print (df)
col1 col2 col3
a 1.2 1.0 1.1
b 5.8 1.0 2.0
c 9.5 0.9 NaN
h 1.9 6.1 NaN
l 1.0 8.0 NaN
o 5.0 NaN NaN
p 3.0 1.1 NaN
v 1.8 0.0 NaN
这是一种方法:
from collections import OrderedDict
import pandas as pd, numpy as np
d = {'a': ['1.20', '1', '1.10'], 'b': ['5.800', '1', '2.000'],
'c': ['9.5000', '0.9000'], 'h': ['1.90000', '6.100000'],
'l': ['1.0000', '8.00000'], 'o': '5.0000', 'p': ['3.00', '1.1000'],
'v': ['1.8', '0.0000']}
# convert to numeric
for k, v in d.items():
lst = list(map(float, v)) if isinstance(v, list) else [float(v)]
lst += [np.nan] * (3 - len(lst))
d[k] = lst
# sort dictionary by key & create cols
d = OrderedDict(sorted(d.items()))
cols = list(zip(*d.values()))
# build dataframe
df = pd.DataFrame.from_dict(d).T
# 0 1 2
# a 1.2 1.0 1.1
# b 5.8 1.0 2.0
# c 9.5 0.9 NaN
# h 1.9 6.1 NaN
# l 1.0 8.0 NaN
# o 5.0 NaN NaN
# p 3.0 1.1 NaN
# v 1.8 0.0 NaN
尝试
df = pd.Series(d).apply(pd.Series).rename(columns=lambda col: 'col{}'.format(col+1))
输出将是
col1 col2 col3
a 1.20 1 1.10
b 5.800 1 2.000
c 9.5000 0.9000 NaN
h 1.90000 6.100000 NaN
l 1.0000 8.00000 NaN
o 5.0000 NaN NaN
p 3.00 1.1000 NaN
v 1.8 0.0000 NaN
没有pd.Series
df = pd.DataFrame(list(map(lambda v: [v] if type(v)!=list else v,d.values())
),index=d.keys(),columns=['col{}'.format(col+1) for col in range(3)])
您可能还想先将dict的所有值填充到length-3数组
padded_d = {k : list(v) + [None] * (3 - len(v)) for k,v in d.items()}
然后使用.from_dict()
的pd.DataFrame()
>>> pd.DataFrame.from_dict(padded_d, orient="index")
0 1 2
a 1.20 1 1.10
b 5.800 1 2.000
c 9.5000 0.9000 None
h 1.90000 6.100000 None
l 1.0000 8.00000 None
p 3.00 1.1000 None
v 1.8 0.0000 None
为了处理输入中键'o': '5.0000'
的格式错误的值'o': '5.0000'
(我们期望'o' : ['5.0000']
-不确定这是否是错字),您应该检查类型。虽然这可能会更清洁
def type_check(s):
if isinstance(s, str):
return [s]
else:
return s
padded_d = {k : type_check(v) + [None] * (3 - len(v)) for k,v in d.items()}
pd.DataFrame.from_dict(padded_d, orient="index")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.