简体   繁体   English

如何有效地从 JSON 列中提取字段?

[英]how to efficiently extract fields from a JSON column?

Consider the following example考虑下面的例子

data1 = [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]
data2 = [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]


dftest = pd.DataFrame({'weirdjson' : [data1, data2]})
dftest['normalcol'] = 1

dftest

Out[79]: 
                                                                                        weirdjson  normalcol  time_type_one  time_type_two
0  [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]          1           2019           2018
1  [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]          1           2013           2012

Essentially, I would like to create two columns time_type_one and time_type_two that each contain their corresponding time value (for the first row: 2019 for type one and 2018 for type two ).本质上,我想创建两列time_type_onetime_type_two ,每列都包含其相应的time值(对于第一行: type one2019type two2018 )。

How can I do that in Pandas?我怎么能在 Pandas 中做到这一点? I have many rows so I am looking for something very efficient.我有很多行,所以我正在寻找一些非常有效的东西。 Thanks!谢谢!

Try this:尝试这个:

import json
import pandas as pd

data = [{'normalcol':1, 'weirdjsoncol':'[{"type": "one", "delta": "1", "time": "2019"}, {"type": "two", "delta": "1", "time": "2018"}]'}, {'normalcol':2, 'weirdjsoncol':'[{"type": "two", "delta": "1", "time": "2017"}, {"type": "one", "delta": "1", "time": "2013"}]'}]

df = pd.DataFrame(data)

df['time_type_one'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "one"), None)["time"])

df['time_type_two'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "two"), None)["time"])

You can try this:你可以试试这个:

df_new = pd.DataFrame().append([x[y] for x in dftest.weirdjson for y in range(len(dftest.weirdjson))])
df_new = df_new.pivot(columns='type', values=['delta', 'time']).apply(lambda x: pd.Series(x.dropna().values)) 
df_new.columns = ['_'.join(col) for col in df_new.columns.values] 

  delta_one delta_two time_one time_two
0         1         1     2019     2018
1         1         1     2013     2017

You may use explode, and construct a new dataframe and unstack type to columns as follows:您可以使用explode,并为列构造一个新的数据框和unstack类型,如下所示:

s = dftest.weirdjson.explode()
df_new = (pd.DataFrame({'type': s.str['type'], 'time': s.str['time']}) 
            .set_index('type', append=True).time.unstack().add_prefix('time_type_'))

Out[461]:
type time_type_one time_type_two
0             2019          2018
1             2013          2012

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM