Lock Free Data Parallelization

When doing data parallelization (e.g. multiple threads processing the elements in an array) if the output of an element depends on the value of elements that may be written to by other threads, you can use a technique called double buffering to prevent data races. The idea is to have 2 copies of your data. All threads read from the original copy, but they write their outputs to a new array.

Multiple thread are reading the same regions of the original array (but that’s ok, because it’s just reading) but no two threads are writing to the same elements in the new (output) array. The end result: you get to process your data in parallel, without using mutexes, and without data races.

The negative of double buffering is using twice the amount of memory, the positive is lock free data parallelization.

            multiple threads
            reading same elements


            thread1   thread2
               │         │
               │         │
               │         │
               └────┬────┘
                    │
                    │
          ┌─────────▼────────────┬───────────────────┬──────────────────────┐
  data    │      region1         │    region2        │    region3           │
 (input)  └──────────────────────┴───────────────────┴──────────────────────┘



          ┌──────────────────────┬───────────────────┬──────────────────────┐
 output   │      region1         │    region2        │    region3           │
          └──────────────────────┴───────────────────┴──────────────────────┘
                  ▲                     ▲
                  │                     │
                  │                     │
                  │                     │
              thread1                thread2


                   only 1 thread writing
                   in each region

A straightforward method of parallelizing the processing of a large amount of data is to spawn multiple threads and have each thread process a certain subset of your data.

If the output of an element is only a function of the element itself, the process is straight forward. Each thread reads the respective element in the array (say at index i) and updates it in the same array (at index i). No need to worry about any data races. This is the same case if the output of an element depends on other elements that only this thread writes to.

If the output of an element depends on the value of other elements that other threads will write to, you now have to worry about data races. Because while one thread is reading the value, another thread may be writing to it.

One way to resolve this is via a technique called “double buffering”. All threads read from the original array, but they don’t write to the same array, they write to a new one. That way, you have multiple threads reading perhaps the same regions in the original array, but that’s ok, because you can have as many threads read shared data as long as there are no writers. Regarding the second array (the output array), each thread is only writing to specific elements (elements it was assigned to process), so there are multiple threads writing to this array, but each at different regions (different elements), thus no data race in this array either!

Once all threads are finished writing all their outputs, you can “swap buffers”, i.e. the new (output) buffer is the result, so display it, or do whatever with it. This means that you need to detect when all threads are done writing their outputs. You can do this with “barrier synchronization”.

Double buffering allows you to do lock free data parallelization but you are using twice the amount of memory (one for the original (read only) buffer, one for the new output (write) buffer).