简体   繁体   English

数据字典的python列表

[英]python list of dicts to dataframe

I have a list of dict objects {key,value} as follows: 我有一个字典对象{key,value}的列表,如下所示:

recd = [{'Type': 'status'}, {'Origin': 'I just earned the Rookie badge on #Yelp!'}, 
         {'Text': 'I just earned the Rookie badge on'}, {'URL': ''}, 
         {'ID': '95314179338158080'}, {'Time': 'Sun Jul 24 21:07:25 CDT 2011'},
         {'RetCount': '0'}, {'Favorite': 'false'},
         {'MentionedEntities': ''}, {'Hashtags': 'Yelp'}]

I've tried any number of ways to move this to a pandas dataframe object, where the key is the column name and the value is the record value. 我尝试了多种方法将其移动到pandas数据框对象,其中键是列名,值是记录值。

s = pd.Series(data=recd)  ## try #1  
tweets = tweets.append(s, ignore_index=True)  

tweets = tweets.append(recd, ignore_index=True)  #try #2  

tweets.from_items(recd)  #try #3  

mylist = [item.split(',') for item in recd] #try #4 (stack overflow)  
tdf = pd.DataFrame(mylist)  

tweets.from_records(recd)  #try #5

tweets.concat(recd, axis=1, etc)  # tries 6-20

Of course, none of these work. 当然,这些都不起作用。 At this point I've tried the obvious and used all the various columns= , ignore_index , etc. parameters) I'm missing something obvious. 在这一点上,我尝试了显而易见的方法,并使用了所有各种columns=ignore_index等参数),我缺少了显而易见的方法。 I typically works with structured data dumps, so this is new to me. 我通常使用结构化数据转储,所以这对我来说是新的。 I suspect I'm not formatting my data correctly, but the solution eludes me. 我怀疑我没有正确格式化数据,但是解决方案使我难以理解。

Background: I'm building each recd object one at a time from a large parsed datafile with a non-standard format into a single, complete record, then trying to convert it to a pandas dataframe, where I can save it in any number of usable formats. 背景:我正在一次将一个非标准格式的大型已解析数据文件中的每个recd对象一次构建为一个完整的记录,然后尝试将其转换为pandas数据框,在其中可以将其保存为任意数量的可用格式。 The process also removes a bunch of data errors. 该过程还消除了许多数据错误。 The code that does this is: 执行此操作的代码是:

 k = line.split(":",1)  
 key = str(k[0].strip())  
 val = str(k[1].strip())  
 if key in TweetFields:  
     d = {key : val}   # also tried d = [key:val]
     recd.append(d)  

Thanks for your advice. 谢谢你的建议。

You could use a dict comprehension to combine the list of dicts into a single dict. 您可以使用dict理解将dict列表合并为一个dict。 Then pass that dict to pd.DataFrame : 然后将该字典传递给pd.DataFrame

In [105]: pd.DataFrame({key: [val] for dct in recd for key, val in dct.items()})
Out[105]: 
  Favorite Hashtags                 ID MentionedEntities  \
0    false     Yelp  95314179338158080                     

                                     Origin RetCount  \
0  I just earned the Rookie badge on #Yelp!        0   

                                Text                          Time    Type URL  
0  I just earned the Rookie badge on  Sun Jul 24 21:07:25 CDT 2011  status      

Although this solves the problem of converting a list of dicts into a single row of a DataFrame, it would be preferrable to avoid using a list of dicts because building a new DataFrame for each row is inefficient. 虽然这解决了转换类型的字典列表转换成数据帧的单排的问题,这将是preferrable避免使用类型的字典列表,因为建设为每行一个新的数据帧是低效的。

You may get more useful answers if you explain what your raw data looks like (with more than one row of data) and what you want the final DataFrame to look like. 如果您解释原始数据的外观(具有多于一行的数据)以及最终的DataFrame的外观,则可能会得到更有用的答案。

If you want just to convert 1 list of dict: 如果只想转换1个字典列表:

temp_df = pd.DataFrame([{key: value for dict in recd for key, value in dict.items()}])

But if you planning to use such construction to create DF with many rows you should join all {key:values} in 1 dict for each record, and append them to list: 但是,如果您打算使用这种构造来创建具有许多行的DF,则应将每条记录的1个字典中的所有{key:values}连接起来,并将它们附加到列表中:

recd = [{'Type': 'status', 'Origin': 'I just earned the Rookie badge on #Yelp!', 
     'Text': 'I just earned the Rookie badge on', 'URL': '', 
     'ID': '95314179338158080', 'Time': 'Sun Jul 24 21:07:25 CDT 2011',
     'RetCount': '0', 'Favorite': 'false',
     'MentionedEntities': '', 'Hashtags': 'Yelp'}]

recd.append({'Type': 'status', 'Origin': 'BLAH BLAH', 
     'Text': 'One more on the road', 'URL': '', 
     'ID': 'NA', 'Time': 'NA',
     'RetCount': 'NA', 'Favorite': 'false',
     'MentionedEntities': '', 'Hashtags': 'Yelp'})

temp_df = pd.DataFrame(recd)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM