简体   繁体   中英

How can I get corresponding list values in one dictionary from another python dictionary where they are listed as keys, compare and print out a csv?

I created a default dictionary from a large amount of data which has values as a list as previewed below. The default_dictionary values are represented as lists in the default dictionary.

default_dict = {('text2015', 'webnet'): [52384, 22276, 97376, 93696, 12672], ('datascience', 'bigdata', 'links'): [18720], ('biological', 'biomedics', 'datamining', 'datamodel', 'semantics'): [82465, 43424], ('links', 'scientometrics'): [23297, 73120]}

I have another data_dictionary that has the individual list values from the default_dictionary as keys. The ordering in the data_dictionary is (key_ID : [text_values], total, guser_ID). The data dictionary has the form :

data_dictionary = {52384: (['text2015', 'webnet'], 1444856137000, 335829830), 18720: (['datascience', 'bigdata', 'links'], 1444859841000, 17987803), 82465: (['biological', 'biomedics', 'datamining', 'datamodel', 'semantics'], 1444856, 335829830), 73120: (['links', 'scientometrics'], 144481000, 17987803), 22276: (['text2015', 'webnet'], 1674856137000, 615387550), 97376: (['text2015', 'webnet'], 1812856137000, 371559830), 43424: (['biological', 'biomedics', 'datamining', 'datamodel', 'semantics'], 5183856, 363549260), 23297: (['links', 'scientometrics'], 1614481000, 26253825)}

The second option (sum) in the values list is the number that I wish to use to compare the different keys. It is a sum amount. I would like the key_ID with the least sum to be shown first in a CSV file with the IDs that have greater sum showing next and so on as shown below. In words :

(key_ID( least sum ); key_ID ; sum for ( least sum ) key_ID ; sum for other key _Id ; shared text)

> 52384 ; 22276 ; 1444856137000 ; 1674856137000 ; ['text2015', 'webnet']
> 52384 ; 97376 ; 1444856137000 ; 1812856137000 ; ['text2015', 'webnet']
> 18720 ; 18720 ; 1444859841000 ; 1444859841000 ; ['datascience','bigdata', 'links']
> 82465 ; 43424 ; 1444856 ; 5183856 ;['biological', 'biomedics', 'datamining', 'datamodel', 'semantics']  
> 73120 ; 23297 ; 144481000 ; 1614481000 ; ['links', 'scientometrics']

So far, I was trying to use a dictionary to build the values and print as a csv using pandas but have not had much success. Any ideas would really help. This code provides every text with its own individual csv file of the key_IDs that share that text.

for key, value in default_dict.items():
    df = pd.DataFrame(value)
    df.to_csv('graph' + '_'.join(key) + '.csv', index=False)

The code below does the following:

  1. Create a new dictionary that holds those records that occur in both of your dictionaries, with each list sorted from lowest to highest 'sum' (I have written it in one expression; for readability you could consider breaking it down into steps)
  2. Go through the new dictionary and see whether the lowest-sum item must have its own line (when it is the only item) or not
  3. Go through the items that must have their own line and output the contents as you formatted them above.

Alternatively you could import it into a DataFrame, to let Pandas handle saving as CSV. I hope this helps.

output_dict = {textval: sorted(
                          [[key_ID, data_dictionary[key_ID][1]]
                          for key_ID in default_dict[textval]
                          if key_ID in data_dictionary],
                        key=lambda x: x[1])
               for textval in default_dict}

for textval, entries in output_dict.items():
    list_for_output = entries if len(entries) == 1 else entries[1:]
    for item in list_for_output:
        print('%d ; %d ; %d ; %d ; %s' % (entries[0][0], item[0],
        entries[0][1], item[1], list(textval)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM