简体   繁体   中英

python list of dicts to dataframe

I have a list of dict objects {key,value} as follows:

recd = [{'Type': 'status'}, {'Origin': 'I just earned the Rookie badge on #Yelp!'}, 
         {'Text': 'I just earned the Rookie badge on'}, {'URL': ''}, 
         {'ID': '95314179338158080'}, {'Time': 'Sun Jul 24 21:07:25 CDT 2011'},
         {'RetCount': '0'}, {'Favorite': 'false'},
         {'MentionedEntities': ''}, {'Hashtags': 'Yelp'}]

I've tried any number of ways to move this to a pandas dataframe object, where the key is the column name and the value is the record value.

s = pd.Series(data=recd)  ## try #1  
tweets = tweets.append(s, ignore_index=True)  

tweets = tweets.append(recd, ignore_index=True)  #try #2  

tweets.from_items(recd)  #try #3  

mylist = [item.split(',') for item in recd] #try #4 (stack overflow)  
tdf = pd.DataFrame(mylist)  

tweets.from_records(recd)  #try #5

tweets.concat(recd, axis=1, etc)  # tries 6-20

Of course, none of these work. At this point I've tried the obvious and used all the various columns= , ignore_index , etc. parameters) I'm missing something obvious. I typically works with structured data dumps, so this is new to me. I suspect I'm not formatting my data correctly, but the solution eludes me.

Background: I'm building each recd object one at a time from a large parsed datafile with a non-standard format into a single, complete record, then trying to convert it to a pandas dataframe, where I can save it in any number of usable formats. The process also removes a bunch of data errors. The code that does this is:

 k = line.split(":",1)  
 key = str(k[0].strip())  
 val = str(k[1].strip())  
 if key in TweetFields:  
     d = {key : val}   # also tried d = [key:val]
     recd.append(d)  

Thanks for your advice.

You could use a dict comprehension to combine the list of dicts into a single dict. Then pass that dict to pd.DataFrame :

In [105]: pd.DataFrame({key: [val] for dct in recd for key, val in dct.items()})
Out[105]: 
  Favorite Hashtags                 ID MentionedEntities  \
0    false     Yelp  95314179338158080                     

                                     Origin RetCount  \
0  I just earned the Rookie badge on #Yelp!        0   

                                Text                          Time    Type URL  
0  I just earned the Rookie badge on  Sun Jul 24 21:07:25 CDT 2011  status      

Although this solves the problem of converting a list of dicts into a single row of a DataFrame, it would be preferrable to avoid using a list of dicts because building a new DataFrame for each row is inefficient.

You may get more useful answers if you explain what your raw data looks like (with more than one row of data) and what you want the final DataFrame to look like.

If you want just to convert 1 list of dict:

temp_df = pd.DataFrame([{key: value for dict in recd for key, value in dict.items()}])

But if you planning to use such construction to create DF with many rows you should join all {key:values} in 1 dict for each record, and append them to list:

recd = [{'Type': 'status', 'Origin': 'I just earned the Rookie badge on #Yelp!', 
     'Text': 'I just earned the Rookie badge on', 'URL': '', 
     'ID': '95314179338158080', 'Time': 'Sun Jul 24 21:07:25 CDT 2011',
     'RetCount': '0', 'Favorite': 'false',
     'MentionedEntities': '', 'Hashtags': 'Yelp'}]

recd.append({'Type': 'status', 'Origin': 'BLAH BLAH', 
     'Text': 'One more on the road', 'URL': '', 
     'ID': 'NA', 'Time': 'NA',
     'RetCount': 'NA', 'Favorite': 'false',
     'MentionedEntities': '', 'Hashtags': 'Yelp'})

temp_df = pd.DataFrame(recd)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM