简体   繁体   English

在Python中查找快速默认别名

[英]Finding fast default aliases in Python

Is there a faster way to do the following for much larger dicts? 对于更大的dicts,有更快的方法来执行以下操作吗?

aliases = {
            'United States': 'USA',
            'United Kingdom': 'UK',
            'Russia': 'RUS',
          }
if countryname in aliases: countryname = aliases[countryname]

Your solution is fine, as "in" is 0(1) for dictionaries. 您的解决方案很好,因为字典中“in”为0(1)。

You could do something like this to save some typing: 你可以做这样的事情来节省一些打字:

countryname = aliases.get(countryname, countryname)

(But I find your code a lot easier to read than that) (但我发现你的代码比那更容易阅读)

When it comes to speed, what solution is best would depend on if there will be a majority of "hits" or "misses". 谈到速度,最好的解决方案取决于是否存在大多数“命中”或“未命中”。 But that would probably be in the nanosecond range when it comes to difference. 但是,当涉及差异时,这可能会在纳秒范围内。

If your list fits in memory, dicts are the fastest way to go. 如果您的列表适合内存,那么dicts是最快的方式。 As S.Mark points out, you are doing two lookups where one will do, either with: 正如S.Mark所指出的那样,你正在进行两次查找,其中一次将会执行:

countryname = aliases.get(countryname, countryname)

(which will leave countryname unchanged if it isn't in the dictionary), or: (如果不在字典中,将保留countryname不变),或者:

try:
    countryname = aliases[countryname]
except KeyError:
    pass

Accessing with .get could be faster than checking and assigning in variable 使用.get访问可能比检查和分配变量更快

aliases.get(countryname)

And if countryname is not exists in aliases it will return None. 如果别名中不存在countryname,则返回None。

If your dictionary is very large and you expect many of your checks not to find a match, then you might want to consider a Bloom filter or one of it's derivatives and allow false positives. 如果您的字典非常大并且您希望许多检查找不到匹配项,那么您可能需要考虑Bloom过滤器或其中一个派生词并允许误报。

Alternatively, because your keys can be sorted (and/or have a derived values), you could implement a bisection or other root-finding algorithm. 或者,因为您的键可以被排序(和/或具有派生值),所以您可以实现二分法或其他根查找算法。

First, I'd figure out exactly how Python implements dictionary look-ups, so you are not just re-inventing the wheel. 首先,我会弄清楚Python究竟是如何实现字典查找的,所以你不仅仅是在重新发明轮子。

Also, a pure-Python implementation of these could be quite slow if it involves a lot of iteration. 此外,如果涉及大量迭代,那么这些的纯Python实现可能会非常慢。 Consider Cython, Numpy, or F2Py to get truly optimized. 考虑使用Cython,Numpy或F2Py进行真正的优化。

(if you are dealing with just country names, then I don't think you are dealing with mappings large enough to warrant any of my suggestions), but if you are looking at doing some kind of spell-check implementation, then.. (如果你只处理国家名称,那么我认为你没有处理足够大的映射以保证我的任何建议),但是如果你正在考虑做某种拼写检查实现,那么......

Good luck. 祝好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM