简体   繁体   English

Pandas 三列 Dataframe 到三重字典列表

[英]Pandas Triple Column Dataframe to triple list of dictionaries

I have the following Dataframe named Data:我有以下 Dataframe 命名数据:

ID ID Labs实验室 SampleName样品名称
1 1 lab1实验室1 banana香蕉
1 1 lab1实验室1 potato土豆
1 1 lab2实验室2 kiwi猕猴桃
1 1 lab2实验室2 cellulose纤维素
2 2 lab1实验室1 NaCl氯化钠
2 2 lab2实验室2 Cl2氯气

I want to convert it to JSON, grouping Samplename by Labs and Labs by ID, so it looks like this:我想将其转换为 JSON,将 Samplename 按 Labs 和 Labs 按 ID 分组,所以它看起来像这样:

{
   "Data":[
      {
         "ID":1,
         "Labs":[
            {
               "Lab_Name":"lab1",
               "SampleName":[
                  {
                     "Sample":"banana"
                  },
                  {
                     "Sample":"potato"
                  }
               ]
            },
            {
               "Lab_Name":"lab2",
               "SampleName":[
                  {
                     "Sample":"kiwi"
                  },
                  {
                     "Sample":"celulose"
                  }
               ]
            }
         ]
      },
      {
         "ID":2,
         "Labs":[
            {
               "Lab_Name":"lab1",
               "SampleName":[
                  {
                     "Sample":"NaCl"
                  }
               ]
            },
            {
               "Lab_Name":"lab2",
               "SampleName":[
                  {
                     "Sample":"NaCl"
                  },
                  {
                     "Sample":"NaCl"
                  }
               ]
            }
         ]
      }
   ]
}

To do so, I have tried:为此,我尝试过:

newdata = pd.DataFrame(data, columns = ['ID','Labs'])
newjsonfile = newdata.groupby(newdata['ID'], as_index=False).agg(list).to_dict(orient='records')

This has succesfully got me the first part of the JSON file which is the Labs grouped by ID, however for the second part (grouping Samples by Labs) this wont work.这成功地让我得到了 JSON 文件的第一部分,这是按 ID 分组的实验室,但是对于第二部分(按实验室分组样本),这将不起作用。 Because of that I've tried to convert both into lists of dictionaries and append these dicts and lists as such:因此,我尝试将它们都转换为字典列表和 append 这些字典和列表如下:

# create a ID list
ID_list = data['ID'].tolist()
ID_list= list(dict.fromkeys(ID_list))
JSONdata = {'Data': []}

counter = 0

# for loop 
for i in ID_list:
    JSONdata['Data'].append({'ID': i})
    vari = data.loc[data['ID'] ==i]
    lab_list = vari['Labs'].tolist()
    for j in lab_list:
        JSONdata['Data'][counter].update('Lab_Name': j)
    counter += 1

This does work however I only get the last Lab name in each ID, as update overwrites the first Lab name.这确实有效,但是我只获得每个 ID 中的最后一个实验室名称,因为更新会覆盖第一个实验室名称。 How can I achieve the wanted result?我怎样才能达到想要的结果? Any hints on how can "pythonize" my code further?关于如何进一步“pythonize”我的代码的任何提示?

You can try:你可以试试:

df = df.rename(columns={'Labs': 'Lab_Name', 'SampleName': 'Sample'})
new_df = (df.groupby(['ID', 'Lab_Name']).apply(lambda x: x[['Sample']].to_dict(
    'records')).reset_index()).rename(columns={0: 'SampleName'})
result = (new_df.groupby(['ID']).apply(lambda x: x[['Lab_Name', 'SampleName']].to_dict(
    'records')).reset_index()).rename(columns={0: 'Labs'}).to_dict('records')

You can use the code below;您可以使用下面的代码;

{'Data': df.groupby(['ID']).\
apply(lambda x:
         {'ID': x.iloc[0,0], 
          'Labs': x.groupby(['Labs']) \
              .apply(lambda y: 
                     {'Lab_Name': y.iloc[0,1],
                      'SampleName': y.groupby(['SampleName']) \
                          .apply(lambda z:
                                 {'Sample': z.iloc[0,2]}
                                ).values.tolist()
                     }
             
             ).values.tolist()
         }
     ).values.tolist()
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM