Prefetching
Recently Used Data
Data Oriented Design
Access Contiguous Data, and Access it Sequentially
Access a Small Amount of Data Over and Over Again
Conclusion

Structure your data in such a way, that when your code is accessing it, CPU cache hits are maximized.

– Data Oriented Design (fake quote)

Welcome back! Today, we’ll be learning how to structure our code/data in a way that maximizes CPU cache hits! Or minimize cache misses, depending on if you’re a glass half full or half empty typa person!

Prefetching

When your CPU fetches data from memory, it fetches it in chunks of 64 bytes called cache lines. So if you tell the CPU to fetch memory at address x, it will actually fetch x+64 bytes. If later, you tell it to fetch memory at address say x+32, it will already be in the cache! This fact that the CPU fetches data in chunks of 64 bytes is called prefetching.

Recently Used Data

When the CPU fetches a certain memory address, that entire cache line (64 MF’ing bytes) will be in the cache for a while. It may get demoted from L1, to L2, and then eventually to L3, but it will still be in the cache for a bit. Thus if you need to access that same data again in the near future, it will likely be in the cache!

Data Oriented Design

Data oriented design is the goal of structuring your data/code in such a way that you take advantage of these two concepts (prefetching/recently-used-data) in order to maximize cache hits. In other words, you try to maximize cache hits by structuring your data in a certain way (generally contiguously), and then accessing it in a certain pattern (either sequentially or constantly re-access it shortly after).

In essence, you are optimizing for the hardware that you will run on, because you are taking advantage of a hardware feature (CPU cache).

Access Contiguous Data, and Access it Sequentially

By structuring your code such that it accesses contiguous memory sequentlially, you are taking advantage of prefetching, thus maximizing cache hits.

A common approach to achieve this is to use struct of arrays instead of array of structs. In other words, instead of having an array of structures, where each structure represents a different instance of your data, have a single struct, and each field of the struct is an array. For example, the x field of the struct is an array containing all the x values of your data. The y field is an array containing all the y values of your data, and so on.

If your code often accesses a particular field (or particular fields) of all your entities, this approach will take advantage of prefetching as your algorithms run through the arrays.

Access a Small Amount of Data Over and Over Again

Alternatively, if you are accessing the same data over and over again in a short period of time, you are taking advantage of the CPU’s MRU (most recently used) cache.

A common example of this in the wild is tiling. Tiling is the process of breaking up your data into small chunks, and then processing each chunk in its entirety before moving on to the next chunk. Each chunk’s data is small enough that it fits in the CPU’s cache, and each chunk’s data is accessed over and over again. This is generally used in multi-dimensional data, such as images, matrices etc. If you are operating on these multi-dimensional structures, you generally jump around in memory a lot, thus incur cache misses. By breaking the multi-dimensional data into small chunks (not into smaller dimensions!), and then constantly re-accessing the same chunk, you are taking advantage of the CPU’s MRU cache. This “constant” reaccessing of the same data is generally needed for these multi-dimensional algorithms anyways, so it is a perfect fit!

Conclusion

Let’s keep it short, I ordered some Chicken Shawarma and I don’t want it to get cold haha.

Data oriented design is just the goal of structuring your data and your code in such a way that you maximize cache hits.
There is really 2 ways of maximizing cache hits.
One, you can try to take advantage of prefetching, you can do this by simply structuring your data contiguously in memory and then accessing it sequentially.
Two, this one only works if you have a suitable problem (i.e. multi-dimensional data like matrices, images, etc), you can break your data into small chunks, and then process each chunk at a time.

Thanks for reading, have an awesome day! I’m hungry!