使用带有列表的字典列表中的 pandas 操作数据框

Question

import pandas as pd

data = [{'sequence': 'he left me',
  'labels': ['relationship', 'sad', 'happy', 'depression', 'suicidal'],
  'scores': [0.9898561835289001,
   0.9809304475784302,
   0.3625302314758301,
   0.31606775522232056,
   0.04021124914288521]},
         {'sequence': 'I lost my job',
  'labels': ['sad', 'relationship', 'depression', 'happy', 'suicidal'],
  'scores': [0.123456,
   0.56789,
   0.78901,
   0.12345,
   0.67890]}]

df = pd.DataFrame(data)
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)

print(df)

that's my code, it's not giving me the right output.那是我的代码，它没有给我正确的 output。

here's the output.这是 output。

        sequence  relationship      sad    happy  depression  suicidal
0     he left me      0.989856  0.98093  0.36253    0.316068  0.040211
1  I lost my job      0.123456  0.56789  0.78901    0.123450  0.678900

you can see that the scores are not correct.你可以看到分数不正确。 'sad' should be 0.123456, but instead it's 0.56789. “悲伤”应该是 0.123456，但它是 0.56789。 I need help here, am kinda new so having hard time.我在这里需要帮助，我有点新，所以很难过。

I think I need help with this line我想我需要这条线的帮助


df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)

I went from this我从这个

df = df.rename(columns={'scores': df['labels'].iloc[0]})

and then this然后这个

df = df.rename(columns={'scores': df['labels'].iloc[0][0]})

after that tried this之后尝试了这个

df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'])], axis=1)

and finally最后

df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'].iloc[0])], axis=1)

I want each of those labels to have their correct scores for every row, not just the first row.我希望每个标签的每一行都有正确的分数，而不仅仅是第一行。

Answer 1

I'd suggest you preprocess your data so that labels and values are related directly, not through the order they appear in their respective lists:我建议您预处理您的数据，以便标签和值直接相关，而不是通过它们在各自列表中出现的顺序：

data_processed = [
    {
      "sequence": record["sequence"], 
      **{
        label: value 
        for label, value in zip(record["labels"], record["scores"])
      },
    }
    for record in data
]

Now you can convert this directly to a DataFrame:现在您可以将其直接转换为 DataFrame：

df = pd.DataFrame(data_processed)

使用带有列表的字典列表中的 pandas 操作数据框

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-12-29 18:11:07

使用带有列表的字典列表中的 pandas 操作数据框

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-12-29 18:11:07

解决方案1
0 已采纳 2022-12-29 18:11:07