比较和移动 Pandas DataFrame 列中的值 - Python

Question

I have the following pandas DataFrame populated:我填充了以下 pandas DataFrame：

The ids contained in the first cell of the column match the ids in the three cells of the second column.该列第一个单元格中包含的 ID 与第二列三个单元格中的 ID 匹配。 The contents of each cell aren't fixed (so those aren't literal string values, but data fetched from variable json api output).每个单元格的内容不是固定的（因此这些不是文字字符串值，而是从变量 json api 输出中获取的数据）。

How would I go about comparing the contents of both columns (and since the contents aren't fixed, I suppose this would have to be done variably rather than by literal strings), and if there's a match, move the matches to the corresponding cell next to it?我 go 如何比较两列的内容（并且由于内容不固定，我想这必须可变地而不是通过文字字符串来完成），如果有匹配项，则将匹配项移动到相应的单元格在它的旁边？ Hope that makes sense, this is the type of output I'm looking for:希望这是有道理的，这是我正在寻找的 output 的类型：

Answer 1

# sample data
df = pd.DataFrame(np.array([{'data': [{'id': '12345', 'type': 'education'}, {'id': '23456', 'type': 'education'}, {'id': '34567', 'type': 'education'}]},
                            {'data': [{'id': '45678', 'type': 'education'}, {'id': '56789', 'type': 'education'}]},
                            {'data': [{'id': '78999', 'type': 'education'}]}]), columns=['Edu ID'])

# create a new frame but orient the index and explode
df_e = pd.DataFrame.from_dict(df['Edu ID'].to_dict(), orient='index')['data'].explode()
# take the new frame and convert it to a list then groupby the index and create a list of ids
final_df = df.join(pd.DataFrame(df_e.tolist(), index=df_e.index).groupby(level=0)['id'].agg(list))


                                              Edu ID                     id
0  {'data': [{'id': '12345', 'type': 'education'}...  [12345, 23456, 34567]
1  {'data': [{'id': '45678', 'type': 'education'}...         [45678, 56789]
2   {'data': [{'id': '78999', 'type': 'education'}]}                [78999]

If you need to filter where type == education then如果您需要过滤 where type == education 那么

# create a new frame but orient the index and explode
df_e = pd.DataFrame.from_dict(df['Edu ID'].to_dict(), orient='index')['data'].explode()

# take the new frame and convert it to a list and create a new frame
df_edu = pd.DataFrame(df_e.tolist(), index=df_e.index)

# use join but filter type to equal education and then gorupby and convert ids to a list
final_df = df.join(df_edu[df_edu['type'] == 'education'].groupby(level=0)['id'].agg(list))

Answer 2

Approach - expand embedded dict and list to dataframe rows and columns.方法 - 将嵌入式dict和list扩展到 dataframe 行和列。 Then build CSV of IDs.然后构建 CSV 个 ID。

df = pd.DataFrame([
    {"Edu ID":{"data":[
    {"id":1,"type":"educations"},
    {"id":2,"type":"educations"},
    {"id":3,"type":"educations"},
                                ]}, "Education ID":1},
    {"Edu ID":{"data":[
    {"id":4,"type":"educations"},
                                ]}, "Education ID":2},
             ])

(
# convert dict to columns, then explode "data" list,  then convert dicts in list to columns
pd.json_normalize(pd.json_normalize(df.to_dict(orient="records"))
                  .reset_index()
                  .explode("Edu ID.data")
                  .to_dict(orient="records"))
    # build required CSV from embedded dicts
    .groupby("index")["Edu ID.data.id"].agg(lambda x: ",".join(list(x.astype(str))))
    .to_frame()
    # bring it together with original DF
    .join(df)
    .rename(columns={"Edu ID.data.id":"Education ID", "Education ID":"OLD Education ID"})
)

output output

Education ID                                                                                                        Edu ID  OLD Education ID
       1,2,3 {'data': [{'id': 1, 'type': 'educations'}, {'id': 2, 'type': 'educations'}, {'id': 3, 'type': 'educations'}]}                 1
           4                                                                   {'data': [{'id': 4, 'type': 'educations'}]}                 2

比较和移动 Pandas DataFrame 列中的值 - Python

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-01-14 20:20:58

解决方案2
1 2021-01-14 20:05:32

output output

比较和移动 Pandas DataFrame 列中的值 - Python

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-01-14 20:20:58

解决方案2 1 2021-01-14 20:05:32

output output

解决方案1
2 已采纳 2021-01-14 20:20:58

解决方案2
1 2021-01-14 20:05:32