如何有效地从 JSON 列中提取字段？

Question

Consider the following example考虑下面的例子

data1 = [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]
data2 = [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]


dftest = pd.DataFrame({'weirdjson' : [data1, data2]})
dftest['normalcol'] = 1

dftest

Out[79]: 
                                                                                        weirdjson  normalcol  time_type_one  time_type_two
0  [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]          1           2019           2018
1  [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]          1           2013           2012

Essentially, I would like to create two columns time_type_one and time_type_two that each contain their corresponding time value (for the first row: 2019 for type one and 2018 for type two ).本质上，我想创建两列time_type_one和time_type_two ，每列都包含其相应的time值（对于第一行： type one为2019 ， type two为2018 ）。

How can I do that in Pandas?我怎么能在 Pandas 中做到这一点？ I have many rows so I am looking for something very efficient.我有很多行，所以我正在寻找一些非常有效的东西。 Thanks!谢谢！

Answer 1

Try this:尝试这个：

import json
import pandas as pd

data = [{'normalcol':1, 'weirdjsoncol':'[{"type": "one", "delta": "1", "time": "2019"}, {"type": "two", "delta": "1", "time": "2018"}]'}, {'normalcol':2, 'weirdjsoncol':'[{"type": "two", "delta": "1", "time": "2017"}, {"type": "one", "delta": "1", "time": "2013"}]'}]

df = pd.DataFrame(data)

df['time_type_one'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "one"), None)["time"])

df['time_type_two'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "two"), None)["time"])

Answer 2

You can try this:你可以试试这个：

df_new = pd.DataFrame().append([x[y] for x in dftest.weirdjson for y in range(len(dftest.weirdjson))])
df_new = df_new.pivot(columns='type', values=['delta', 'time']).apply(lambda x: pd.Series(x.dropna().values)) 
df_new.columns = ['_'.join(col) for col in df_new.columns.values] 

  delta_one delta_two time_one time_two
0         1         1     2019     2018
1         1         1     2013     2017

Answer 3

You may use explode, and construct a new dataframe and unstack type to columns as follows:您可以使用explode，并为列构造一个新的数据框和unstack类型，如下所示：

s = dftest.weirdjson.explode()
df_new = (pd.DataFrame({'type': s.str['type'], 'time': s.str['time']}) 
            .set_index('type', append=True).time.unstack().add_prefix('time_type_'))

Out[461]:
type time_type_one time_type_two
0             2019          2018
1             2013          2012

如何有效地从 JSON 列中提取字段？

问题描述

3 个解决方案

解决方案1
1 2019-12-29 21:43:45

解决方案2
1 2019-12-29 21:56:28

解决方案3
1 已采纳 2019-12-30 01:39:19

如何有效地从 JSON 列中提取字段？

问题描述

3 个解决方案

解决方案1 1 2019-12-29 21:43:45

解决方案2 1 2019-12-29 21:56:28

解决方案3 1 已采纳 2019-12-30 01:39:19

解决方案1
1 2019-12-29 21:43:45

解决方案2
1 2019-12-29 21:56:28

解决方案3
1 已采纳 2019-12-30 01:39:19