How does python's ordered dict have constant time complexity for add, search, and delete?
Python’s collections.OrderedDict
(and the regular dict in python 3.7+) is a key-value data structure that allows for constant time search, add, and delete operations, AND it maintains the order in which items (key-value pairs) were inserted. How is this possible?
Python uses 2 internal data structures for its dict. It uses a hash table, which we already know has constant time add, search, and remove operations, however it also uses a doubly linked list to maintain the order the items were inserted.
When an item is inserted into the dict, its hash is also inserted into the doubly linked list, and a pointer to this node of the linked list is stored along with the key-value pair in the hash-table bucket! So here is what happens when you insert an item:
- The hash of the key is calculated
- A new node is created in the doubly linked list, and the hash is stored in the node
- This only happens if this is the first item in the dict with this hash (i.e. the first item going into this specific bucket)
- The key-value pair along with a pointer to the new doubly linked list node are stored in a bucket in the hash table, according to the hash of the key
H1, H2, etc are hash1, hash2, etc
v is just some value. Notice some buckets have multiple values (b/c hash collisions).
Hash Table: Doubly Linked List:
+-----------+ +------+
| H1: v |---------> | H1 |
+-----------+ +------+
| H2: v |---------> | H2 |
| v | +------+
+-----------+ /---> | H3 |
| H3: v |----/ +------+
+-----------+ | .... |
| .... | +------+
+-----------+
Notice: Each bucket points to the corresponding node in the doubly linked list!
When you delete an item from the dict:
- The hash of the key is calculated
- The hash is used to find the corresponding bucket in the hash table
- The key-value pair is removed from the bucket
- If the bucket is empty, the corresponding node in the doubly linked list is removed
- so whenever a bucket becomes empty (gets a size of 0), the corresponding node in the doubly linked list is removed, and whenever a bucket gets a size of 1, a new node is added to the doubly linked list
When you “iterate” through the items of the dict:
- you just follow the hashes in the doubly linked list, to the corresponding buckets, and you iterate over the items in the buckets!
So, folks, there ya have it! A key-value data structure that maintains the order of insertion, and has constant time add, search, and delete operations! Pretty neat!
Python dicts use an additional data structure, a doubly linked list, to maintain insertion order. When an item is inserted into the dict, if a node hasn’t been created for the target bucket in the doubly linked list, a new node is created and this mapping (from bucket to node) is stored in the bucket. When all items of a bucket are deleted, its corresponding node in the doubly linked list is deleted.