如何根據字典列表中的另一個值有效地查找字典值

Question

我有一個非常大（~100k）的字典列表：

[{'sequence': 'read the rest of this note', 'score': 0.22612378001213074, 'token': 3805, 'token_str': 'note'}, {'sequence': 'read the rest of this page', 'score': 0.11293990164995193, 'token': 3674, 'token_str': 'page'}, {'sequence': 'read the rest of this week', 'score': 0.06504543870687485, 'token': 1989, 'token_str': 'week'}]

給定一個token ID（例如1989 ），我怎樣才能以有效的方式找到相應的score ？ 我必須為每個列表多次執行此操作（我有幾個這樣的大列表，並且每個列表都有幾個令牌 ID）。

我目前正在遍歷列表中的每個字典並檢查 ID 是否與我的輸入ID匹配，如果匹配，我將得到score 。 但這很慢。

Answer 1

由於您必須多次搜索，因此可能會創建一個以令牌為鍵的字典：

a = [{'sequence': 'read the rest of this note', 'score': 0.22612378001213074, 'token': 3805, 'token_str': 'note'}, {'sequence': 'read the rest of this page', 'score': 0.11293990164995193, 'token': 3674, 'token_str': 'page'}, {'sequence': 'read the rest of this week', 'score': 0.06504543870687485, 'token': 1989, 'token_str': 'week'}]

my_dict = {i['token']: i for i in a}

創建dict需要一些時間，但每次搜索之后都是O(1) 。

這可能看起來效率低下，但 python 以非常有效的方式處理 memory，因此不是在新dict的list中創建相同的字典，它實際上包含對已在列表中構造的dict的引用，您可以確認使用：

>>> a[0] is my_dict[3805]
True

因此，您可以將其解釋為為列表中的每個元素創建別名。

Answer 2

對於大型數據集，使用 pandas 可能更有效。

使用令牌 3805 查找分數的示例：

import pandas as pd

source_list = [{'sequence': 'read the rest of this note', 'score': 0.22612378001213074, 'token': 3805, 'token_str': 'note'}, {'sequence': 'read the rest of this page', 'score': 0.11293990164995193, 'token': 3674, 'token_str': 'page'}, {'sequence': 'read the rest of this week', 'score': 0.06504543870687485, 'token': 1989, 'token_str': 'week'}]

df = pd.DataFrame(source_list)
result = df[df.token == 3805]

print(result.score.values[0])

Answer 3

如果您的字典列表是：

l = [{'sequence': 'read the rest of this note', 'score': 0.22612378001213074, 'token': 3805, 'token_str': 'note'}, {'sequence': 'read the rest of this page', 'score': 0.11293990164995193, 'token': 3674, 'token_str': 'page'}, {'sequence': 'read the rest of this week', 'score': 0.06504543870687485, 'token': 1989, 'token_str': 'week'}]

您感興趣的token值例如是：

token_values = [1989, 30897, 98762]

然后：

構建字典如下：

d = {the_dict['token']: the_dict['score']
    for the_dict in l where the_dict['token'] in token_values}

這將構建一個最小字典，其中僅包含您感興趣的鍵值及其相應的分數。

如何根據字典列表中的另一個值有效地查找字典值

問題描述

3 個解決方案

解決方案1
4 已采納 2021-12-09 15:22:15

解決方案2
0 2021-12-09 15:53:30

解決方案3
0 2021-12-09 16:18:07

如何根據字典列表中的另一個值有效地查找字典值

問題描述

3 個解決方案

解決方案1 4 已采納 2021-12-09 15:22:15

解決方案2 0 2021-12-09 15:53:30

解決方案3 0 2021-12-09 16:18:07

解決方案1
4 已采納 2021-12-09 15:22:15

解決方案2
0 2021-12-09 15:53:30

解決方案3
0 2021-12-09 16:18:07