简体   繁体   中英

How do I convert a dictionary that has nested dictionaries within it into a dataframe in Python?

I recently did a sentiment analysis using Oracle's AI Language API in Python. I had the API iterate over 1300 Tweets and stored the output from the API in a list, where each element in the list corresponded with a single Tweet ID. I then created a dictionary, where the key was the Tweet ID and the value was the output from the API for that Tweet ID. I now have a massive dictionary with dictionaries nested within dictionaries and am not sure how to convert this to a dataframe in Pandas.

Here are the first few entries of the dictionary I am working with.

 {1292750633104289792: {
   "aspects": []
 },
 1275918779831238656: {
   "aspects": []
 },
 1293251961031204865: {
   "aspects": [
     {
       "length": 8,
       "offset": 51,
       "scores": {
         "Negative": 0.18023298680782318,
         "Neutral": 0.0,
         "Positive": 0.8197670578956604
       },
       "sentiment": "Positive",
       "text": "building"
     }
   ]
 },
 1293312774563606531: {
   "aspects": []
 },
 1293375754751881217: {
   "aspects": [
     {
       "length": 4,
       "offset": 5,
       "scores": {
         "Negative": 0.9987309575080872,
         "Neutral": 0.0012690634466707706,
         "Positive": 0.0
       },
       "sentiment": "Negative",
       "text": "poll"
     }
   ]
 }}

Thanks so much in advance.

You can flatten your structure using a nested comprehension, and then pass the result to pd.DataFrame :

import pandas as pd
data = {1292750633104289792: {'aspects': []}, 1275918779831238656: {'aspects': []}, 1293251961031204865: {'aspects': [{'length': 8, 'offset': 51, 'scores': {'Negative': 0.18023298680782318, 'Neutral': 0.0, 'Positive': 0.8197670578956604}, 'sentiment': 'Positive', 'text': 'building'}]}, 1293312774563606531: {'aspects': []}, 1293375754751881217: {'aspects': [{'length': 4, 'offset': 5, 'scores': {'Negative': 0.9987309575080872, 'Neutral': 0.0012690634466707706, 'Positive': 0.0}, 'sentiment': 'Negative', 'text': 'poll'}]}}
r = [{'tweet_id':a, 
       'length':i['length'],
        'offset':i['offset'],
        **{f'score_{j}':k for j, k in i['scores'].items()},
        'sentiment':i['sentiment'],
        'text':i['text'],
     } 
     for a, b in data.items() for i in b['aspects']]

df = pd.DataFrame(r)

Output:

              tweet_id  length  offset  score_Negative  score_Neutral  score_Positive sentiment      text
0  1293251961031204865       8      51        0.180233       0.000000        0.819767  Positive  building
1  1293375754751881217       4       5        0.998731       0.001269        0.000000  Negative      poll

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM