[英]Order of iteration through dict.items()
TLDR: If I build a dictionary at two seperate times from the same data processed in the same way, should the order of dictionary.items() be the same each time? TLDR:如果我从以相同方式处理的相同数据分两次构建字典,dictionary.items() 的顺序每次都应该相同吗?
Hello,你好,
I have dictionary linked_strain_acc
which has about 2000 keys (strain names) and each key has another dictionary as a value ( data
).我有字典
linked_strain_acc
,它有大约 2000 个键(菌株名称),每个键都有另一个字典作为值( data
)。
linked_strain_acc = {'strain1' : {'gcf' : ['gcf1', 'gcf'..],
'key2' : val2,
.........},
'strain2' : {.........},
..........
'strain2000' : {.........}}
I am iterating over a key ( 'gcf'
) in each data
dictionary, which contains a list of gcf
ids.我正在迭代每个
data
字典中的一个键 ( 'gcf'
),其中包含一个gcf
id 列表。 I'm using the gcf
ids to build a url for scraping, after testing that it's not already been scraped.我正在使用
gcf
id 来构建一个用于抓取的 url,在测试它尚未被抓取后。
directory = r'C:\Users\u03132tk\.spyder-py3\scrape_dsmz\zip_files'
count = 0
start = time.time()
#allows you to stop and start
current_files = os.listdir(directory)
for strain,data in linked_strain_acc.items():
for gcf in data['gcf']:
count+=1
filename = f'{strain}__{gcf}.zip'
if filename not in current_files:
download_url = f'https://antismash-db.secondarymetabolites.org/output/{gcf}/{gcf}.zip'
response = requests.get(download_url)
with open(fr'{directory}\{filename}', "wb") as infile:
infile.write(response.content)
print (f'downloaded {strain}, {gcf}')
else:
print (f'{strain}, {gcf} already scraped')
if count%50 == 0:
print (f'downloaded {count} jsons - script has been running for {round((time.time() - start)/60, 1)} minutes')
Question题
I have already scraped about 1500 of the gcf
urls and downloaded the files (out of the 2000ish total).我已经
gcf
大约 1500 个gcf
网址并下载了文件(总共 2000 个)。 When I ran it again this morning, instead of printing '{strain}, {gcf} already scraped' for the first 1500 print statements, its alternating between a couple of '{strain}, {gcf} already scraped' print messages and 'downloaded {strain}, {gcf}' print statements.当我今天早上再次运行它时,不是为前 1500 个打印语句打印 '{strain}, {gcf} already scraped',而是在几个 '{strain}, {gcf} already scraped' 打印消息和 '下载了 {strain}, {gcf}' 打印语句。 This implies that the order of the linked_strain_acc dictionary has changed.
这意味着linked_strain_acc 字典的顺序已经改变。
I made this dictionary from a CSV file which was processed in exactly the same way each time to make linked_strain_acc
.我从一个 CSV 文件制作了这本字典,该文件每次都以完全相同的方式进行处理以制作
linked_strain_acc
。 Why would the order of the dict change, or am I missing something?为什么 dict 的顺序会改变,或者我错过了什么? I know that dict key/val order isn't ordered by eg alphabet or size, but I though it would be maintained when it is built from exactly the same data.
我知道 dict key/val order 不是按例如字母表或大小排序的,但我认为当它是从完全相同的数据构建时会被维护。
Thanks!谢谢!
In older versions, python was using string pools to efficiently store longer strings by pooling shorter common segments.在旧版本中,python 使用字符串池通过池化较短的公共段来有效地存储较长的字符串。 Every time you create a string, it may change the pool, and hence the order.
每次创建字符串时,它可能会更改池,从而更改顺序。 The strings you dynamically create in
您在其中动态创建的字符串
download_url = f'https://antismash-db.secondarymetabolites.org/output/{gcf}/{gcf}.zip'
may change the pool depending on your starting point.可能会根据您的起点更改池。 For reference: https://en.wikipedia.org/wiki/String_interning
供参考: https : //en.wikipedia.org/wiki/String_interning
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.