Mark and Sweep Garbage Collection Algorithm



1061 views Garbage Collection



The Mark-and-Sweep garbage collection algorithm is one of the most common and widely adopted garbage collection algorithms out there. It leverages the power of Graph data structure along with DFS to power the cleanup.

Almost all programming languages support allocating objects in heap and referring them via pointers to another object. This reference creates a small graph of all the references linked to the one root node.

The root node could be a global variable (object) or something allocated on the thread stack, and anything and everything referenced from that becomes the child nodes and the process continues.

Indirect Collection Algorithm

The mark-and-Sweep garbage collection algorithm is an indirect collection algorithm, which means it does not have any direct information about the garbage, instead, it identifies the garbage by eliminating everything LIVE.

When is the GC triggered?

The garbage collection is triggered when the runtime environment is unable to allocate any new object on the heap. When the language tries to fire new and it is unable to allocate space, it first triggers a quick garbage collection and then tries to re-allocate; if not successful again it throws an OutOfMemory exception, and in most cases, it is a fatal error.

Mark-and-Sweep Algorithm

Although the algorithm is primarily split into two phases Mark and Sweep; to build a better understanding we split it into 4.

Prepare the root list

The first step of this algorithm is to extract and prepare the root list, and the roots could be global variables or variables referenced in the thread stack. These become the seed objects on which we then trigger a DFS.

Mark the roots and proceed

Once we identified all the roots, we mark all the roots as LIVE and proceed with the Mark phase. A super optimization that some implementations apply is to invoke the Mark phase for every root as it would help us keep the stack smaller.

Mark

The mark phase is just a Depth First Search traversal on one root node and the idea is to mark every node as LIVE that is reachable from the root node. Once all the nodes are marked, we can conclude that the nodes that remain unmarked are garbage; and the ones to be cleaned up.

Sweep

The nodes left unmarked are the garbage and hence in the sweep phase, the GC iterates through all the objects and frees the unmarked objects, and resets the marked object to prepare them for the next cycle.


Arpit Bhayani

Arpit's Newsletter

CS newsletter for the curious engineers

❤️ by 17000+ readers

If you like what you read subscribe you can always subscribe to my newsletter and get the post delivered straight to your inbox. I write essays on various engineering topics and share it through my weekly newsletter.




Other essays that you might like


Why caching would not speed up Mark-and-Sweep GC?

759 views 46 likes 2022-05-20

So, caching doesn't always work! What would happen if we apply caching to speed up our Mark and Sweep Gargabge collecto...

Tricolor Abstraction to build concurrent Garbage Collectors

796 views 41 likes 2022-05-06

A basic Mark-and-Sweep garbage collection algorithm operates in Stop-the-World mode, which means the program execution p...

Mark and Sweep Garbage Collection Algorithm

1061 views 57 likes 2022-04-29

Garbage Collection has to be one of the most interesting topics of discussion out there. In the previous videos, we took...

How to pick a garbage collector?

948 views 47 likes 2022-04-22

"Best" Garbage Collector is a myth. If you are building your Garbage Collector or trying to pick the best for your use ...


Be a better engineer

A set of courses designed to make you a better engineer and excel at your career; no-fluff, pure engineering.


System Design Masterclass

A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.

Enrolled by 700+ learners

Details →

Designing Microservices

A free course to help you understand Microservices and their high-level patterns in depth.

Enrolled by 17+ learners

Details →

GitHub Outage Dissections

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 67+ learners

Details →

Hash Table Internals

A free course to help you learn core engineering from outages that happened at GitHub.

Enrolled by 25+ learners

Details →

BitTorrent Internals

A free course to help you understand the algorithms and strategies that power P2P networks and BitTorrent.

Enrolled by 42+ learners

Details →

Topics I talk about

Being a passionate engineer, I love to talk about a wide range of topics, but these are my personal favourites.




Arpit's Newsletter read by 17000+ engineers

🔥 Thrice a week, in your inbox, an essay about system design, distributed systems, microservices, programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.



  • v12.4.4
  • © Arpit Bhayani, 2022

Powered by this tech stack.