How to pick a garbage collector?



948 views Garbage Collection



“Best” Garbage Collector is a myth.

If you are ever writing your Garbage Collector or trying to pick one for your workload, there are 7 metrics that you should be evaluating a GC algorithm on.

No one garbage collector can be the best across all 7 properties and it was found that any GC algorithm is at least 15% better than other algorithms on at least one of the characteristics. Hence it all boils down to the needs of your specific workload to pick one over the other. The key characteristics are

Safety

A garbage collector is safe when it never reclaims the space of a LIVE object and always cleans up only the dead objects.

Although this looks like an obvious requirement, some GC algorithms claim space of LIVE objects just to gain that extra ounce of performance.

Throughput

A garbage collector should be as little time cleaning up the garbage as possible; this way it would ensure that the CPU is spent on doing actual work and not just cleaning up the mess.

Most garbage collectors hence run small cycles frequently and a major cycle does deep cleaning once a while. This way they maximize the overall throughput and ensure we spend more time doing actual work.

Completeness

A garbage collector is said to be complete when it eventually reclaims all the garbage from the heap.

It is not desirable to do a complete clean-up every time the GC is executed, but eventually, a GC should guarantee that the garbage is cleaned up ensuring zero memory leaks.

Pause Time

Some garbage collectors pause the program execution during the cleanup and this induces a “pause”. Long pauses affect the throughput of the system and may lead to unpredictable outcomes; so a GC is designed and tuned to minimize the pause time.

The garbage collector needs to pause the execution because it needs to either run defragmentation where the heap objects are shuffled freeing up larger contiguous memory segments.

Space overhead

Garbage collectors require auxiliary data structures to track objects efficiently and the memory required to do so is pure overhead. An efficient GC should have this space overhead as low as possible allowing sufficient memory for the program execution.

Language Specific Optimizations

Most GC algorithms are generic but when bundled with the programing language the GC can exploit the language patterns and object allocation nuances. So, it is important to pick the GC that can leverage these details and make its execution as efficient as possible.

For example, in some programming languages, GC runs in constant time by exploiting how objects are allocated on the heap.

Scalability

Most GC are efficient in cleaning up a small chunk of memory, but a scalable GC would run efficiently even on a server with large RAM. Similarly, a GC should be able to leverage multiple CPU cores, if available, to speed up the execution.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Why caching would not speed up Mark-and-Sweep GC?

759 views 46 likes 2022-05-20

So, caching doesn't always work! What would happen if we apply caching to speed up our Mark and Sweep Gargabge collecto...

Tricolor Abstraction to build concurrent Garbage Collectors

796 views 41 likes 2022-05-06

A basic Mark-and-Sweep garbage collection algorithm operates in Stop-the-World mode, which means the program execution p...

Mark and Sweep Garbage Collection Algorithm

1061 views 57 likes 2022-04-29

Garbage Collection has to be one of the most interesting topics of discussion out there. In the previous videos, we took...

How to pick a garbage collector?

948 views 47 likes 2022-04-22

"Best" Garbage Collector is a myth. If you are building your Garbage Collector or trying to pick the best for your use ...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.




Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.



  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.