My notes on data structures. Not meant to be read by others, but I’ll put it here anyways.

What Are They?

A data structure is a special way of organizing data, such that it is easy (scaleable) to search, add to, or remove from (or some other properties).

Some Common Data Structures and Their Properties

Graph

A graph consists of nodes (aka vertices) and edges. The edges can be unidirectional or bidirectional. There can be cycles. If the edges have directions and there are no cycles, we call it a DAG (directed acyclical graph). There are things that you can assume of and do to a DAG that you can’t to all graphs.

You might be interested in finding out how to go from one node to another, following the shortest distance (each edge can have a “distance weight”). There are several approaches:

Find the SPT (shortest path tree) rooted at your source node (starting node).
- This tree gives you the shortest path from your source node, to every other node
- Relatively expensive:
  - linear wrt the number of edges
  - $nlog(n)$ wrt the number of vertices
- After you find the SPT, you need to do a DFS (or BFS), to go from root to the node of interest. This is $O(v)$
You can use A* algorithm, which is $O(v)$ , much better!

Tree

A Tree is a special type of graph. It has a root node, and no cycles. Each node has one and only one parent, except the root node, which has 0 parents.

You might be interested in searching the tree, either in a DFS or a BFS fashion. Both of these are $O(v)$ , because you will have to visit every vertex anyways, but DFS starts the search at the leafs while BFS starts at the top. So if you expect your target to be at the leafs, use DFS, if you expect it to be at the top, use BFS.

Binary Search Tree

A tree with a special property. For every node, the child on its left must be equal or smaller, and the child on its right must be bigger (or you can reverse the convention, doesn’t matter, all that matters is one side is where the bigger stuff are and the other side is where the smaller stuff are). This assumes that each node only has 2 children (a left and a right one). Searching, inserting, and deleting operations on a binary search tree are all logarithmic.

Self Balancing Binary Search Tree

A Binary Search Tree, that will automatically rebalance upon insertions/delections such that its height (number of levels below the root), doesn’t get too long. Great for implementations of priority queues, sets, and maps (aka associative array, aka dictionary), because upon repeated insertions/deletions, the operations will still be logarithmic. If Binary Search Trees get too long, the operations approach linear scaling!

B Tree

A Binary Search Tree where each node can have more than 2 children (“less” or “greater”). You can have, for example, 3 children, one of them in the range of this-that, the other in the range of something else, and the last in another range. These need rebalancing less often than Binary Search Trees, you can kinda visualize why.

KD tree

When you have uni-dimensional data (i.e. scaler numbers), you can organize them nicely in a binary search tree for effecient searching. But what if you have multi-dimensional data (i.e. vectors)? You can store them in another tree, called a KD tree, where you alternate what dimension you are cutting in as you go down the tree. So, at level 1, you might be cutting in dimension 1, at level 2, you will be cutting in dimension 2, at level 3 you are cutting in dimension 1 again, at level 4, you are cutting in dimension 2 again, and so on. Just keep alternating what dimension you are cutting in as you go down the tree in levels.

Hashtable

Basically a large array and a hash function. You get an item, you pass it to your hash function, which will produce an index for the item. You place the item in that array cell (aka bucket). Now in these buckets, you actually have lists. If two items happen to produce the same index via the hash function, they will be put in the same bucket, but the second item will be the second item in the list that is in the bucket.