简体   繁体   English

使用带有列表的字典列表中的 pandas 操作数据框

[英]manipulating data frame with pandas from a list of dictionary with lists

import pandas as pd

data = [{'sequence': 'he left me',
  'labels': ['relationship', 'sad', 'happy', 'depression', 'suicidal'],
  'scores': [0.9898561835289001,
   0.9809304475784302,
   0.3625302314758301,
   0.31606775522232056,
   0.04021124914288521]},
         {'sequence': 'I lost my job',
  'labels': ['sad', 'relationship', 'depression', 'happy', 'suicidal'],
  'scores': [0.123456,
   0.56789,
   0.78901,
   0.12345,
   0.67890]}]

df = pd.DataFrame(data)
df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)

print(df)

that's my code, it's not giving me the right output.那是我的代码,它没有给我正确的 output。

here's the output.这是 output。

        sequence  relationship      sad    happy  depression  suicidal
0     he left me      0.989856  0.98093  0.36253    0.316068  0.040211
1  I lost my job      0.123456  0.56789  0.78901    0.123450  0.678900

you can see that the scores are not correct.你可以看到分数不正确。 'sad' should be 0.123456, but instead it's 0.56789. “悲伤”应该是 0.123456,但它是 0.56789。 I need help here, am kinda new so having hard time.我在这里需要帮助,我有点新,所以很难过。

I think I need help with this line我想我需要这条线的帮助


df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(),columns=df['labels'].iloc[0])], axis=1)

I went from this我从这个

df = df.rename(columns={'scores': df['labels'].iloc[0]})

and then this然后这个

df = df.rename(columns={'scores': df['labels'].iloc[0][0]})

after that tried this之后尝试了这个

df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'])], axis=1)

and finally最后

df = pd.concat([df['sequence'], pd.DataFrame(df['scores'].tolist(), columns=df['labels'].iloc[0])], axis=1)

I want each of those labels to have their correct scores for every row, not just the first row.我希望每个标签的每一行都有正确的分数,而不仅仅是第一行。

I'd suggest you preprocess your data so that labels and values are related directly, not through the order they appear in their respective lists:我建议您预处理您的数据,以便标签和值直接相关,而不是通过它们在各自列表中出现的顺序:

data_processed = [
    {
      "sequence": record["sequence"], 
      **{
        label: value 
        for label, value in zip(record["labels"], record["scores"])
      },
    }
    for record in data
]

Now you can convert this directly to a DataFrame:现在您可以将其直接转换为 DataFrame:

df = pd.DataFrame(data_processed)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM