[英]Converting dictionary to dataframe, and then melting / stacking columns to rows
[英]Split list of dictionary column to separate columns, melting the dataframe
對字典列表使用嵌套列表推導,然后傳遞給 DataFrame 構造函數:
df = pd.DataFrame({'id':[1,2], 'variant':[[{'position':1, 'price':100}, {'position':2, 'price':500},
{'position':3, 'price':300}],
[ {'position':1, 'price':150}, {'position':2, 'price':400}]]})
print (df)
id variant
0 1 [{'position': 1, 'price': 100}, {'position': 2...
1 2 [{'position': 1, 'price': 150}, {'position': 2...
L = [{**{'id':x},**z} for x, y in zip(df['id'], df['variant']) for z in y]
df2 = pd.DataFrame(L)
或DataFrame.explode
與json_normalize
:
df1 = df.explode('variant').reset_index(drop=True)
df2 = df1[['id']].join(pd.json_normalize(df1['variant']))
print (df2)
id position price
0 1 1 100
1 1 2 500
2 1 3 300
3 2 1 150
4 2 2 400
如果上述解決方案返回:
TypeError:“str”對象不是映射
因為有字符串解決方案是:
df = pd.DataFrame({'id':[1,2], 'variant':["[{'position':1, 'price':100}, {'position':2, 'price':500}, {'position':3, 'price':300}]",
"[ {'position':1, 'price':150}, {'position':2, 'price':400}]"]})
print (df)
id variant
0 1 [{'position':1, 'price':100}, {'position':2, '...
1 2 [ {'position':1, 'price':150}, {'position':2, ...
import ast
L = [{**{'id':x},**z} for x,y in zip(df['id'], df['variant']) for z in ast.literal_eval(y)]
df2 = pd.DataFrame(L)
print (df2)
id position price
0 1 1 100
1 1 2 500
2 1 3 300
3 2 1 150
4 2 2 400
10k 行的性能:
df = pd.DataFrame({'id':[1,2], 'variant':[[{'position':1, 'price':100}, {'position':2, 'price':500},
{'position':3, 'price':300}],
[ {'position':1, 'price':150}, {'position':2, 'price':400}]]})
df = pd.concat([df] * 5000, ignore_index=True)
#keramat solution
In [23]: %%timeit
...: df.explode('variant').apply({'variant':lambda x: pd.Series(x), 'id': lambda x: pd.Series(x)}).droplevel(0, axis = 1).rename(columns={'position':'position', 'price':'price', 0:'id'})
...:
14.2 s ± 505 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [24]: %%timeit
...: df1 = df.explode('variant').reset_index(drop=True)
...:
...: df1[['id']].join(pd.json_normalize(df1['variant']))
...:
...:
180 ms ± 4.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [25]: %%timeit
...: pd.DataFrame([{**{'id':x},**z} for x, y in zip(df['id'], df['variant']) for z in y])
...:
...:
52.3 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
利用:
df.explode('variant').apply({'id': lambda x: x, 'variant':lambda x: pd.Series(x)}).droplevel(0, axis = 1)
輸出:
id position price
0 1 1 100
0 1 2 500
0 1 3 300
1 2 1 150
1 2 2 400
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.