I want to check whether a key exists in a dictionary. The most appropriate way, as per my knowledge is: if d_.get(s):
. But, while attempting a question on Leetcode, there was a TLE error when I was using this method. However, when I tried if s in d_
, TLE was gone. I want to know why in
is faster than get()
.
I tried going through some questions, found this one where there is an explanation for d_.get()
v/s d_[s]
. None of the questions addressed d_.get()
v/s s in d_
.
Just in case, some context:
The code that failed with if self.memo.get(s):
:
from typing import List
class Solution:
def __init__(self):
self.word_dict = {}
self.memo = {}
def word_break(self, s):
if not s:
return True
if self.memo.get(s):
return self.memo[s]
res = False
for word in self.word_dict.keys():
if len(word) <= len(s) and s[:len(word)] == word:
res = res or self.word_break(s[len(word):])
self.memo[s] = res
return res
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
for word in wordDict:
self.word_dict[word] = 1
return(self.word_break(s))
The code than got accepted with if s in self.memo
:
from typing import List
class Solution:
def __init__(self):
self.word_dict = {}
self.memo = {}
def word_break(self, s):
if not s:
return True
if s in self.memo:
return self.memo[s]
res = False
for word in self.word_dict.keys():
if len(word) <= len(s) and s[:len(word)] == word:
res = res or self.word_break(s[len(word):])
self.memo[s] = res
return res
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
for word in wordDict:
self.word_dict[word] = 1
return(self.word_break(s))
I always presumed that in
would be slower than fetching attributes(here, get()
).
Using the dis.dis
method from the linked question:
>>> import dis
>>> dis.dis(compile('d.get(key)', '', 'eval'))
1 0 LOAD_NAME 0 (d)
2 LOAD_METHOD 1 (get)
4 LOAD_NAME 2 (key)
6 CALL_METHOD 1
8 RETURN_VALUE
>>> dis.dis(compile('key in d', '', 'eval'))
1 0 LOAD_NAME 0 (key)
2 LOAD_NAME 1 (d)
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
we can clearly see that d.get(key)
has to run one more step: the LOAD_METHOD
step. Additionally, d.get
must deal with more information: it has to:
None
if no default was specified). Also, from looking at the C code for in
and the C code for .get
, we can see that they are very similar.
int static PyObject *
PyDict_Contains(PyObject *op, PyObject *key) dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{ {
Py_hash_t hash; PyObject *val = NULL;
Py_ssize_t ix; Py_hash_t hash;
PyDictObject *mp = (PyDictObject *)op; Py_ssize_t ix;
PyObject *value;
if (!PyUnicode_CheckExact(key) || if (!PyUnicode_CheckExact(key) ||
(hash = ((PyASCIIObject *) key)->hash) == -1) { (hash = ((PyASCIIObject *) key)->hash) == -1) {
hash = PyObject_Hash(key); hash = PyObject_Hash(key);
if (hash == -1) if (hash == -1)
return -1; return NULL;
} }
ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value); ix = (self->ma_keys->dk_lookup) (self, key, hash, &val);
if (ix == DKIX_ERROR) if (ix == DKIX_ERROR)
return -1; return NULL;
return (ix != DKIX_EMPTY && value != NULL); if (ix == DKIX_EMPTY || val == NULL) {
} val = default_value;
}
Py_INCREF(val);
return val;
}
In fact, they are almost the same, but .get
has more overhead and must return a value.
However, it seems that d in key
will use a faster method if the hash is known, while d.get
recalculates the hash every time. Additionally, CALL_METHOD
and LOAD_METHOD
have much higher overhead than COMPARE_OP
, which performs one of the built-in boolean operations. Note that COMPARE_OP will simply jump to here .
The time overhead is in calling a method explicitly , as opposed to letting language constructs take care of it. We can demonstrate this with timeit
:
>>> timeit.timeit('"__name__" in x', 'x = globals()')
0.037103720999766665
>>> timeit.timeit('x.__contains__("__name__")', 'x = globals()')
0.07471312899997429
>>> timeit.timeit('x["__name__"]', 'x = globals()')
0.03828814600001351
>>> timeit.timeit('x.__getitem__("__name__")', 'x = globals()')
0.07529343100031838
>>> timeit.timeit('x.get("__name__")', 'x = globals()')
0.08261531900006958
I initially started trying to figure out the difference by looking at the source code for __contains__()
and .get()
, respectively, only to find that they're nearly identical except for .get()
incrementing the object's reference count (which should be more or less negligible). Certainly there wasn't enough difference to explain the time difference you'd be seeing.
But, doing tests, we can see that actually using language constructs ( in
and []
) as opposed to the explicit method calls that they would turn into ( __contains__()
and __getitem__()
, respectively), is a full 50% faster.
A full investigation would take a while and more effort than I care to spend, but I hypothesize this is due to some built-in speedups and skipped steps that the interpreter applies - using a language construct instead of explicitly calling a method narrows down the level of complexity that can be expected, and the interpreter could jump straight into the C code without the overhead of calling the method first.
As @rassar's answer demonstrates, this is, in fact, basically what happens.
The two pieces of code do not do the same thing. Notice how self.memo
is set:
self.memo[s] = res
If res
is False
, the if
statement for the get
will fail while the if
for in
will succeed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.