简体   繁体   English

比较和移动 Pandas DataFrame 列中的值 - Python

[英]Comparing and shifting values in Pandas DataFrame columns - Python

I have the following pandas DataFrame populated:我填充了以下 pandas DataFrame:

块引用

The ids contained in the first cell of the column match the ids in the three cells of the second column.该列第一个单元格中包含的 ID 与第二列三个单元格中的 ID 匹配。 The contents of each cell aren't fixed (so those aren't literal string values, but data fetched from variable json api output).每个单元格的内容不是固定的(因此这些不是文字字符串值,而是从变量 json api 输出中获取的数据)。

How would I go about comparing the contents of both columns (and since the contents aren't fixed, I suppose this would have to be done variably rather than by literal strings), and if there's a match, move the matches to the corresponding cell next to it?我 go 如何比较两列的内容(并且由于内容不固定,我想这必须可变地而不是通过文字字符串来完成),如果有匹配项,则将匹配项移动到相应的单元格在它的旁边? Hope that makes sense, this is the type of output I'm looking for:希望这是有道理的,这是我正在寻找的 output 的类型:

块引用

# sample data
df = pd.DataFrame(np.array([{'data': [{'id': '12345', 'type': 'education'}, {'id': '23456', 'type': 'education'}, {'id': '34567', 'type': 'education'}]},
                            {'data': [{'id': '45678', 'type': 'education'}, {'id': '56789', 'type': 'education'}]},
                            {'data': [{'id': '78999', 'type': 'education'}]}]), columns=['Edu ID'])

# create a new frame but orient the index and explode
df_e = pd.DataFrame.from_dict(df['Edu ID'].to_dict(), orient='index')['data'].explode()
# take the new frame and convert it to a list then groupby the index and create a list of ids
final_df = df.join(pd.DataFrame(df_e.tolist(), index=df_e.index).groupby(level=0)['id'].agg(list))


                                              Edu ID                     id
0  {'data': [{'id': '12345', 'type': 'education'}...  [12345, 23456, 34567]
1  {'data': [{'id': '45678', 'type': 'education'}...         [45678, 56789]
2   {'data': [{'id': '78999', 'type': 'education'}]}                [78999]

If you need to filter where type == education then如果您需要过滤 where type == education 那么

# create a new frame but orient the index and explode
df_e = pd.DataFrame.from_dict(df['Edu ID'].to_dict(), orient='index')['data'].explode()

# take the new frame and convert it to a list and create a new frame
df_edu = pd.DataFrame(df_e.tolist(), index=df_e.index)

# use join but filter type to equal education and then gorupby and convert ids to a list
final_df = df.join(df_edu[df_edu['type'] == 'education'].groupby(level=0)['id'].agg(list))

Approach - expand embedded dict and list to dataframe rows and columns.方法 - 将嵌入式dictlist扩展到 dataframe 行和列。 Then build CSV of IDs.然后构建 CSV 个 ID。

df = pd.DataFrame([
    {"Edu ID":{"data":[
    {"id":1,"type":"educations"},
    {"id":2,"type":"educations"},
    {"id":3,"type":"educations"},
                                ]}, "Education ID":1},
    {"Edu ID":{"data":[
    {"id":4,"type":"educations"},
                                ]}, "Education ID":2},
             ])

(
# convert dict to columns, then explode "data" list,  then convert dicts in list to columns
pd.json_normalize(pd.json_normalize(df.to_dict(orient="records"))
                  .reset_index()
                  .explode("Edu ID.data")
                  .to_dict(orient="records"))
    # build required CSV from embedded dicts
    .groupby("index")["Edu ID.data.id"].agg(lambda x: ",".join(list(x.astype(str))))
    .to_frame()
    # bring it together with original DF
    .join(df)
    .rename(columns={"Edu ID.data.id":"Education ID", "Education ID":"OLD Education ID"})
)

output output

Education ID                                                                                                        Edu ID  OLD Education ID
       1,2,3 {'data': [{'id': 1, 'type': 'educations'}, {'id': 2, 'type': 'educations'}, {'id': 3, 'type': 'educations'}]}                 1
           4                                                                   {'data': [{'id': 4, 'type': 'educations'}]}                 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM