Membership checking on Python is O(n) for lists and tuples and O(1) for dicts and sets
Today I learned that the time complexity of a membership checking operation is different for each Python type.
Type | Complexity |
---|---|
list | `O(n)` |
set | `O(1)` |
dict (keys) | `O(1)` |
tuple | `O(n)` |
This means that doing a membership check like this is `O(n)`:
if "mykey" in mydict.keys():
# do stuff
While this is a `O(1)`:
if "mykey" in mydict:
# do stuff
(Update 2023-07-24: according to this comment on Reddit, doing membership checks over keys()
is also `O(1)` in Python 3 because this method is just a convenient view for dict keys).
But why is membership checking faster or simpler on sets and dicts than lists and tuples? The answer lies in how each data type is structured.
For a list, membership checks involve iterating over each element in the list until a match is found or the end of the list is reached.
On the other hand, dictionaries are implemented as a hash table.
This means that when a key-value pair is added to a Python dict, the key is passed to the bult-in hash function hash()
(plus a salt, which changes everytime the interpreter starts, starting from Python 3.3).
The hash function result becomes the memory address where the value is going to be stored.
And since the hash calculation time is independent of the dictionary size, the in
operation is `O(1)`.
A nice example of how Python dictionaries are implemented in practice can be seen here.
This explains why dicts are so fast when checking membership, but what about sets?
It happens that Python sets are implemented similarly to dicts. The difference is that only keys are stored in a hash table (instead of key-value pairs).