[英]how to efficiently extract fields from a JSON column?
Consider the following example考虑下面的例子
data1 = [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]
data2 = [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]
dftest = pd.DataFrame({'weirdjson' : [data1, data2]})
dftest['normalcol'] = 1
dftest
Out[79]:
weirdjson normalcol time_type_one time_type_two
0 [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}] 1 2019 2018
1 [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}] 1 2013 2012
Essentially, I would like to create two columns time_type_one
and time_type_two
that each contain their corresponding time
value (for the first row: 2019
for type one
and 2018
for type two
).本质上,我想创建两列
time_type_one
和time_type_two
,每列都包含其相应的time
值(对于第一行: type one
为2019
, type two
为2018
)。
How can I do that in Pandas?我怎么能在 Pandas 中做到这一点? I have many rows so I am looking for something very efficient.
我有很多行,所以我正在寻找一些非常有效的东西。 Thanks!
谢谢!
Try this:尝试这个:
import json
import pandas as pd
data = [{'normalcol':1, 'weirdjsoncol':'[{"type": "one", "delta": "1", "time": "2019"}, {"type": "two", "delta": "1", "time": "2018"}]'}, {'normalcol':2, 'weirdjsoncol':'[{"type": "two", "delta": "1", "time": "2017"}, {"type": "one", "delta": "1", "time": "2013"}]'}]
df = pd.DataFrame(data)
df['time_type_one'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "one"), None)["time"])
df['time_type_two'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "two"), None)["time"])
You can try this:你可以试试这个:
df_new = pd.DataFrame().append([x[y] for x in dftest.weirdjson for y in range(len(dftest.weirdjson))])
df_new = df_new.pivot(columns='type', values=['delta', 'time']).apply(lambda x: pd.Series(x.dropna().values))
df_new.columns = ['_'.join(col) for col in df_new.columns.values]
delta_one delta_two time_one time_two
0 1 1 2019 2018
1 1 1 2013 2017
You may use explode, and construct a new dataframe and unstack
type to columns as follows:您可以使用explode,并为列构造一个新的数据框和
unstack
类型,如下所示:
s = dftest.weirdjson.explode()
df_new = (pd.DataFrame({'type': s.str['type'], 'time': s.str['time']})
.set_index('type', append=True).time.unstack().add_prefix('time_type_'))
Out[461]:
type time_type_one time_type_two
0 2019 2018
1 2013 2012
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.