使用列表将 JSON 列展平为 dataframe

Question

I have a JSON in a dataframe column as:我在 dataframe 列中有一个 JSON 为：

x = '''{"sections": 
[{
    "id": "12ab", 
    "items": [
        {"id": "34cd", 
        "isValid": true, 
        "questionaire": {"title": "blah blah", "question": "Date of Purchase"}
        },
        {"id": "56ef", 
        "isValid": true, 
        "questionaire": {"title": "something useless", "question": "Date of Billing"}
        }
    ]
}],
"ignore": "yes"}'''

I wanted the id, the internal id inside the items list and the question from the questionaire json:我想要 id，项目列表中的内部 id 和问卷 json 中的问题：

I was able to extract the info using the below code:我能够使用以下代码提取信息：

df_norm = json_normalize(json.loads(x)['sections'])
df_norm = df_norm[['id', 'items']]
df1 = (pd.concat({k: pd.DataFrame(v) for k, v in df_norm.pop('items').items()}).reset_index(level=1, drop=True))
df = df_norm.join(df1, rsuffix='_').reset_index(drop=True)
df['child_id'] = df.pop('id_')
df = df[['id', 'child_id', 'questionaire']]
df.questionaire = df.questionaire.fillna({i: {} for i in df.index})
idx = df.set_index(['id', 'child_id']).questionaire.index
result = pd.DataFrame(df.
                      set_index(['id', 'child_id']).
                      questionaire.values.tolist(),index=idx).reset_index()
result = result[['id','child_id','question']]
result

Result DataFrame looks like this.结果 DataFrame 如下所示。 You can run it to verify:您可以运行它来验证：

	id ID	child_id child_id	question问题
0 0	12ab 12ab	34cd 34cd	Date of Purchase购买日期
1 1个	12ab 12ab	56ef 56ef	Date of Billing账单日期

My problem is to make this work with a Dataframe where the json value shared above is a column in itself.我的问题是使它与 Dataframe 一起工作，其中上面共享的 json 值本身就是一列。 The input I actually have looks like this:我实际拥有的输入如下所示：

id ID	name名称	location地点	flatten展平
1 1个	xyz xyz	new york纽约	the json 'x' above上面的 json 'x'

I am unable to tie it back when I have to do it for multiple such JSONs as a column value.当我必须对多个这样的 JSON 作为列值执行此操作时，我无法将其绑定。

The final result DataFrame I would want is:我想要的最终结果 DataFrame 是：

Masterid大师级	name名称	location地点	id ID	child_id child_id	question问题
1 1个	xyz xyz	new york纽约	12ab 12ab	34cd 34cd	Date of Pruchase购买日期
1 1个	xyz xyz	new york纽约	12ab 12ab	56ef 56ef	Date of Billing账单日期

Answer 1

Idea is use dictionary comprehension with column flatten for i for index values, so after concat is possible join to original DataFrame:想法是使用字典理解和列flatten i作为索引值，因此在concat之后可以连接到原始 DataFrame：

x = '''{"sections": 
[{
    "id": "12ab", 
    "items": [
        {"id": "34cd", 
        "isValid": true, 
        "questionaire": {"title": "blah blah", "question": "Date of Purchase"}
        },
        {"id": "56ef", 
        "isValid": true, 
        "questionaire": {"title": "something useless", "question": "Date of Billing"}
        }
    ]
}],
"ignore": "yes"}'''


df = pd.DataFrame({'id':['1','2'], 'name':['xyz', 'abc'], 
                    'location':['new york', 'wien'], 'flatten':[x,x]})

#create default RangeIndex
df = df.reset_index(drop=True)

d = {i: pd.json_normalize(json.loads(x)['sections'],
                          'items', ['id'], 
                          record_prefix='child_')[['id','child_id','child_questionaire.question']]
                             .rename(columns={'child_questionaire.question':'question'})
     for  i, x in df.pop('flatten').items()}

df_norm = df.rename(columns={'id':'Masterid'}).join(pd.concat(d).reset_index(level=1, drop=True))

print (df_norm)
  Masterid name  location    id child_id          question
0        1  xyz  new york  12ab     34cd  Date of Purchase
0        1  xyz  new york  12ab     56ef   Date of Billing
1        2  abc      wien  12ab     34cd  Date of Purchase
1        2  abc      wien  12ab     56ef   Date of Billing

使用列表将 JSON 列展平为 dataframe

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-04-05 05:54:53

使用列表将 JSON 列展平为 dataframe

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-04-05 05:54:53

解决方案1
1 已采纳 2022-04-05 05:54:53