将嵌套列拆分为新列

Question

My dataframe has a nested column (people_info) that contains cells like the sample below.我的 dataframe 有一个嵌套列 (people_info)，其中包含如下示例中的单元格。

[{"institution":"some_institution","startMonth":1,"startYear":2563,"course":"any","id":1111,"formation":"any","endMonth":12,"endYear":2556,"status":"complete"}] [{"institution":"some_institution","startMonth":1,"startYear":2563,"course":"any","id":1111,"formation":"any","endMonth":12, “结束年”：2556，“状态”：“完成”}]

As far I know this can be solved using dictionary/json concepts.据我所知，这可以使用字典/json 概念来解决。

I'm trying to split this column in new columns, considering that each key of this nested cell will be a new column with their respective values.考虑到这个嵌套单元格的每个键都将是一个具有各自值的新列，我正在尝试将此列拆分为新列。

I tried json_normalize, but I'm getting this error: "AttributeError: 'str' object has no attribute 'values'"我尝试了 json_normalize，但出现此错误：“AttributeError: 'str' object has no attribute 'values'”

I tried to transform those cells in a dict, but I never was able to make python understand that "institution" is a key and "some_institution" is a value in this created dict.我试图在字典中转换这些单元格，但我从来没有能够让 python 明白“机构”是一个键，而“some_institution”是这个创建的字典中的一个值。 It's seems python understand the whole cell as a string.似乎 python 将整个单元格理解为一个字符串。

Can you help me?你能帮助我吗？ If I wasn't clear, please tell me.如果我不清楚，请告诉我。 Tks!谢！

Answer 1

IIUC, the following should work: IIUC，以下应该有效：

Input输入

df = pd.DataFrame({'col1':[1], 'col2':2, 'nested_column':'[{"institution":"some_institution","startMonth":1,"startYear":2563,"course":"any","id":1111,"formation":"any","endMonth":12,"endYear":2556,"status":"complete"}]'})

df

  col1  col2    nested_column
0    1     2    [{"institution":"some_institution","startMonth...

Process过程

import json
df['nested_column_dict'] = df['nested_column'].transform(lambda x : json.loads(x)[0] if x is not np.nan else {})
df = pd.concat([df, pd.DataFrame.from_records(df['nested_column_dict'])], axis=1)
df.drop('nested_column_dict', axis=1, inplace=True)

Output Output

 df

 col1   col2    nested_column                                           institution startMonth  startYear   course    id    formation   endMonth    endYear   status
0   1      2    [{"institution":"some_institution","startMonth...   some_institution         1      2563       any  1111          any         12       2556 complete

Answer 2

Maybe this helps.也许这有帮助。

import pandas as pd导入 pandas 作为 pd

data = [{"institution":"some_institution", "startMonth":1, "startYear":2563, "course":"any", "id":1111, "formation":"any", "endMonth":12, "endYear":2556, "status":"complete"}]数据= [{“机构”：“some_institution”，“startMonth”：1，“startYear”：2563，“course”：“any”，“id”：1111，“formation”：“any”，“endMonth”： 12，“结束年”：2556，“状态”：“完成”}]

l = next(item for item in data) l = next（数据中的项目）

df = pd.DataFrame(l, index=[0]) df = pd.DataFrame(l, index=[0])

df df

将嵌套列拆分为新列

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-05-24 20:31:08

解决方案2
0 2021-05-24 19:51:11

将嵌套列拆分为新列

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-05-24 20:31:08

解决方案2 0 2021-05-24 19:51:11

解决方案1
1 已采纳 2021-05-24 20:31:08

解决方案2
0 2021-05-24 19:51:11