Tricolor Abstraction to build concurrent Garbage Collectors



796 views Garbage Collection



The Mark and Sweep garbage collection algorithm is a simple DFS traversal with two phases - Mark and Sweep. In the mark phase, it marks all the objects that are reachable from the root, and the Sweep phase clears off all the unreachable ones.

This crude algorithm is slow and requires a program pause which means everything stops when the GC is cleaning up. This affects the performance and throughput of the program.

So, can we write a GC that runs concurrently with the program and does not need to always stop the world?

The foundation of concurrent Garbage Collector is based on a concept called Tricolour Abstraction which was developed by Dijkstra and Lamport, known for their work on core algorithms and distributed systems.

States of an object

While tracing the objects across the heap we can see that each object can be in one of the 3 states: unprocessed, processing, and processed; and this becomes our three colors

White: unprocessed objects Grey: visited but whose children are yet to be visited Black: done processing

The garbage collection flow

Every node starts white. Once we see an object we color it Grey and once we are done visiting its children we mark it Black. This way we have 3 sets of objects: white, great, and black, and the objects move from white to grey and from grey to black.

A key thing to note here is that we will never have an object that moves directly from white to black; i.e. there will never be an edge that connects one black and one white node.

How does Tricoloration make it better?

Because a black node is never connected to a white node, we ensure correctness i.e. a live object will never be cleaned up.

Our GC flow can now be simplified as

  • pick the object from the grey set
  • color all the children of the node grey
  • move the grey object to the black

repeat the flow until the grey set is empty. Once done, we can just visit the white set, which now contains all the unreachable objects, and clean them up.

Speedup

Now that we segregated the objects into sets we can ensure a quick cleanup by putting more threads at work on the grey set. It is hard to make a crude DFS concurrent and structuring it as sets make implementation much simpler.

We make our system reactive by keeping an eye on the grey set and triggering GC as soon as it hits some threshold.

This method of garbage collection is called “on-the-fly” which runs concurrently with the program and mutates the color of the objects as part of program execution and does not wait for a separate GC cycle. This reduces the load on GC and minimizes the pause.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Why caching would not speed up Mark-and-Sweep GC?

759 views 46 likes 2022-05-20

So, caching doesn't always work! What would happen if we apply caching to speed up our Mark and Sweep Gargabge collecto...

Tricolor Abstraction to build concurrent Garbage Collectors

796 views 41 likes 2022-05-06

A basic Mark-and-Sweep garbage collection algorithm operates in Stop-the-World mode, which means the program execution p...

Mark and Sweep Garbage Collection Algorithm

1061 views 57 likes 2022-04-29

Garbage Collection has to be one of the most interesting topics of discussion out there. In the previous videos, we took...

How to pick a garbage collector?

948 views 47 likes 2022-04-22

"Best" Garbage Collector is a myth. If you are building your Garbage Collector or trying to pick the best for your use ...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.




Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.



  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.