简体   繁体   中英

How to write nested dictionary in csv with python when the row contents are key values of related key (the header of each column)?

I have a dictionary called "output" , there are some other dictionary nested in it as bellow:

>>> output.keys() 
dict_keys(['posts', 'totalResults', 'moreResultsAvailable', 'next', 'requestsLeft', 'warnings'])

>>> output['posts'][0].keys() 
dict_keys(['thread', 'uuid', 'url', 'ord_in_thread', 'parent_url', 'author', 'published', 'title','text', 'highlightText', 'highlightTitle', 'highlightThreadTitle', 'language', 'external_links', 'external_images', 'entities', 'rating', 'crawled', 'updated'])

>>> output['posts'][0]['thread'].keys() 
dict_keys(['uuid', 'url', 'site_full', 'site', 'site_section', 'site_categories', 'section_title', 'title', 'title_full', 'published', 'replies_count', 'participants_count', 'site_type', 'country', 'spam_score', 'main_image', 'performance_score', 'domain_rank', 'reach', 'social'])

>>> output['posts'][0]['thread']['social'].keys() 
dict_keys(['facebook', 'gplus', 'pinterest', 'linkedin', 'stumbledupon', 'vk'])

I want to make a csv file consisting of a list of selected keys from output['posts'][0] , output['posts'][0]['thread'] and output['posts'][0]['thread']['social'] with related values as each row content, I came up with this code:

post_keys = output['posts'][0].keys()
post_thread_keys = output['posts'][0]['thread'].keys()
social_keys = output['posts'][0]['thread']['social'].keys()

with open('file.csv', 'w', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=post_thread_keys)
    writer.writeheader()

    for i in range(len(output['posts'])):
         for key in output['posts'][i]['thread']:
            writer.writerow(output['posts'][i]['thread'])

It only works for first level of dictionary which is "output['posts'][0]['thread']" , not other insiders, and also it doubles the number of rows which is 200 now instead of 100.

Now the result is like this: 电流输出

Wish to be like this: 所需输出

Please have a look at the output file I have stored on google drive for more tangible approach: file.csv

You need a function to create the sub-keys in the format you have specified. By using a function, it can also be called to give you the list of the extra column names needed for the header.

As you are adding 3 sub-entries, they could be removed from the columns to avoid duplication (by using .pop() )

import webhoseio
import csv

def get_social_entries(social):
    social_entries = {}

    for social_key, social_values in social.items():
        for key, value in social_values.items():
            social_entries[f'{social_key}_{key}'] = value
            
    return social_entries
        
    
    
# <<Get output here>>

csv_columns = []
 
first_post = output['posts'][0]

for key in first_post['thread']:
    csv_columns.append(key)
 
for key in first_post:
    if key not in ['entities', 'thread', 'social']:
        csv_columns.append(key)
 
for key in first_post['entities']:
    csv_columns.append(key)

csv_columns.extend(list(get_social_entries(first_post['thread']['social']).keys()))

with open('file.csv', 'w', encoding='utf-8', newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
    writer.writeheader()
    
    for post in output['posts']:
        thread = post.pop('thread')
        entities = post.pop('entities')
        social = thread.pop('social')
        social_entries = get_social_entries(social)
        writer.writerow(post | thread | entities | social_entries)     # | operator needs Python 3.9

This assumes you are using Python 3.9, if not you could use something like:

row = post
row.update(thread)
row.update(entities)
row.update(social_entries)
writer.writerow(row)

Note: newline='' is added to remove the extra blank lines in the output.

You could use a similar approach to also expand the entities .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM