通过 dict.items() 的迭代顺序

Question

TLDR: If I build a dictionary at two seperate times from the same data processed in the same way, should the order of dictionary.items() be the same each time? TLDR：如果我从以相同方式处理的相同数据分两次构建字典，dictionary.items() 的顺序每次都应该相同吗？

Hello,你好，

I have dictionary linked_strain_acc which has about 2000 keys (strain names) and each key has another dictionary as a value ( data ).我有字典linked_strain_acc ，它有大约 2000 个键（菌株名称），每个键都有另一个字典作为值（ data ）。

linked_strain_acc = {'strain1' : {'gcf' : ['gcf1', 'gcf'..],
                                  'key2' : val2,
                                  .........},
                    'strain2' :  {.........},
                    ..........
                    'strain2000' :  {.........}}

I am iterating over a key ( 'gcf' ) in each data dictionary, which contains a list of gcf ids.我正在迭代每个data字典中的一个键 ( 'gcf' )，其中包含一个gcf id 列表。 I'm using the gcf ids to build a url for scraping, after testing that it's not already been scraped.我正在使用gcf id 来构建一个用于抓取的 url，在测试它尚未被抓取后。

directory = r'C:\Users\u03132tk\.spyder-py3\scrape_dsmz\zip_files'
count = 0
start = time.time()
#allows you to stop and start
current_files = os.listdir(directory)
for strain,data in linked_strain_acc.items():
    for gcf in data['gcf']:
        count+=1
        filename = f'{strain}__{gcf}.zip'
        if filename not in current_files:
            download_url = f'https://antismash-db.secondarymetabolites.org/output/{gcf}/{gcf}.zip'
            response = requests.get(download_url)
            with open(fr'{directory}\{filename}', "wb") as infile:
                infile.write(response.content)
            print (f'downloaded {strain}, {gcf}')
        else:
            print (f'{strain}, {gcf} already scraped')
        if count%50 == 0:
            print (f'downloaded {count} jsons - script has been running for {round((time.time() - start)/60, 1)} minutes')

Question题

I have already scraped about 1500 of the gcf urls and downloaded the files (out of the 2000ish total).我已经gcf大约 1500 个gcf网址并下载了文件（总共 2000 个）。 When I ran it again this morning, instead of printing '{strain}, {gcf} already scraped' for the first 1500 print statements, its alternating between a couple of '{strain}, {gcf} already scraped' print messages and 'downloaded {strain}, {gcf}' print statements.当我今天早上再次运行它时，不是为前 1500 个打印语句打印 '{strain}, {gcf} already scraped'，而是在几个 '{strain}, {gcf} already scraped' 打印消息和 '下载了 {strain}, {gcf}' 打印语句。 This implies that the order of the linked_strain_acc dictionary has changed.这意味着linked_strain_acc 字典的顺序已经改变。

I made this dictionary from a CSV file which was processed in exactly the same way each time to make linked_strain_acc .我从一个 CSV 文件制作了这本字典，该文件每次都以完全相同的方式进行处理以制作linked_strain_acc 。 Why would the order of the dict change, or am I missing something?为什么 dict 的顺序会改变，或者我错过了什么？ I know that dict key/val order isn't ordered by eg alphabet or size, but I though it would be maintained when it is built from exactly the same data.我知道 dict key/val order 不是按例如字母表或大小排序的，但我认为当它是从完全相同的数据构建时会被维护。

Thanks!谢谢！

Answer 1

In older versions, python was using string pools to efficiently store longer strings by pooling shorter common segments.在旧版本中，python 使用字符串池通过池化较短的公共段来有效地存储较长的字符串。 Every time you create a string, it may change the pool, and hence the order.每次创建字符串时，它可能会更改池，从而更改顺序。 The strings you dynamically create in您在其中动态创建的字符串

download_url = f'https://antismash-db.secondarymetabolites.org/output/{gcf}/{gcf}.zip'

may change the pool depending on your starting point.可能会根据您的起点更改池。 For reference: https://en.wikipedia.org/wiki/String_interning供参考： https : //en.wikipedia.org/wiki/String_interning

通过 dict.items() 的迭代顺序

问题描述

1 个解决方案

解决方案1
0 2021-10-26 14:38:56

通过 dict.items() 的迭代顺序

问题描述

1 个解决方案

解决方案1 0 2021-10-26 14:38:56

解决方案1
0 2021-10-26 14:38:56