I changed the Python print statement to understand how it works internally

Watch the video explanation ➔

Write an engineering article with a concrete timeline in Markdown format from the following transcript.

so printing a variable has to be the most common thing we do in Python but how does it work internally in this video we go through the source code of Python language to understand how brain function in Python really works and to ensure that our understanding is correct we modify the print function to print object metadata as well the intention of this video is to make you understand how easy it is to make changes to a massive code base and contribute to open source it does feel overwhelming but when we take baby steps things do become really simple this is the third video of the series so I would recommend you to check the other ones out there is no hard dependency for this video and other videos so let's jump right into it here I have my C python codebase setup now here when I do make it builds a binary locally which I can run and that binary the python binary is built from this very source code right the build is complete when I do dot slash python it runs the python interpreter built from this very source code now I whenever I want to print something I can just pass in that thing and it gets it gets printed now in this massive code base I don't know where the implementation of print function is I don't know where it is implemented so how do I find it in this massive code base the thing is that if I just do a Brute Force search of print like this expecting that I would find the print function I could see that there are more than 5000 plus 5400 plus results that's insane I cannot go through all of them so there has to be a better way so what I typically do with python is that in Python when we type in health of a particular function name it prints the help string of it or the doc string of it so when I do this I get this that print function takes arcs which is variable at arcs you can pass it as Tuple uh separator and an end a file at a flush and then it says prints the values to a stream or STD out by default so the documentation that this is generating has to be part of the source code otherwise how it is getting printed right so what I'll do is I'll grab this and I'll search for this now those 5442 results that were there have now just reduced to two and one is a DOT C file one is a dot h file we can easily see the implementation has to be in the dot C5 you could see this exact same thing that was there uh in the documentation we see over here may not be like word to word but you at least narrow down to the place that matters right now here we see beneath that a built-in print imple which is the implementation of print function and we see below this in the documentation we see print args separator and file and flush are separator and file and flush this has to be this function right now given this given we have converged to the function now let's understand what print function actually does and then we'll see if the C code in the C python code that exactly getting implemented so the print function prints whatever we pass to the print function uh variable one comma variable two comma variable three and so on so forth arcs followed by you can pass an optional key optional quarks separator which means when you are printing something how do you want them to be separated let me give you a quick example of that so let's say I do print one two three now when I do one two three it prints one space two space three but now I can just change this to say print one two three and I can say that my separator has to be a hyphen not a space so it's space one space one hyphen two hyphen three similarly when I just say that by default it ends with a slash n but I can say that end it with done so I could see one two three d o n e and then the prompt started because I didn't end it with the slash n so you can see see where end happens you can see where separator goes in right similarly the file is where you need to write by default it's STD out but you can pass in any file object and it would write it to that location right to that file right okay now that we understand how print like how print is used let's see this exact same thing through the source code and then we also modify the source code now here we see that as the function execution starts it first takes a file is none because none is different from null C has null python has none the none in Python is also an object called Pi none right so if file is equal to equal to none which means no file is passed then what do we do we use STD out that's what we did over here so we see file if file is none then file is equal to stdo we don't understand a lot of things around that but we get the idea on what's Happening Here file is equal to STD out right that's what we want then if separator is none which with no separator is passed then you use the separator then you set separator as null you set it as null otherwise you set it something and then what you are doing is you are iterating through this where did it go okay we go over here the RX that were passed were star arcs which means it would be a tuple so I would iterate through the Tuple iteration through Tuple is for I equal to zero I is less than length of Tuple I plus plus this that I equal to 0 I is less than Pi Tuple get size should be getting me the size of the Tuple I plus plus and then if separator is null I am writing string space to the file which is the default separator so it is writing string to the file and the file if passed that file otherwise STD out that we already handled at first so this is where the separator is getting written and then we are if separator is not null then that corresponding separator is written right so if it is null space is written if not null then the separator is written then if error equal to null something like that I don't know what it is some error handle while writing to a file some error happened that error is captured in return null and here once this if I greater than 0 is done then we are writing right object Pi Tuple get item arcs comma I this looks like from Tuple we are getting an item so ith item from this arcs and we are writing that object to file and some print raw should be printing a string or something we don't know but that's okay we don't need to know everything right so we are getting that object and printing it over here and some error condition once this iteration is complete we are checking and if end is null we pass slash and pass as in we write slash n to the file which is exactly the default end and otherwise we write end over there and apart from that if flush is passed we do flush and all but you get the idea on what's happening right the exact flow that we thought of is exactly what is implemented in C now we know where the object is getting written whatever whichever object we pass is getting written it's written it's getting written over here five file right object double get item args comma I right to this file in a raw format now let's do this apart from just printing the object let's add some metadata to it just so that we ensure that our understanding is correct so now what do we do we want to first of all we will We what do we want to print we want to know what exists before we decide what we want to print so what I would do is I would take this Tuple object in Python everything is a pi object so I'll store it in pi object star obj equal to this right and we pass this obj over here which means it would not break my existing flow existing one would run as this now we need to know what exists for us to know what we are trying to print so what exists in this object so this object contains two things reference count OB underscore ref count and OB underscore type a ref count looks like the reference counting that is used for garbage collection that how many variables or how many places where is this variable getting referenced from is stored over here and OB underscore type should be the type of the object so that depending on which it invokes a corresponding Str method should be that so what we'll do is let's say we want to print it in this format we would want to print it in this format let's say you have to print the type of the object first angular bracket type of the object colon reference count of the object then angular bracket closed and space after which I want to print the actual object right just making changes to it so that we understand how to go about it right now given this is what we want to print when the print function is invoked on a particular object I want to get this information we know that obj Arrow type would give me the type but let's see because we want to print the name of the type so we see a bunch of thing in type because obj OB underscore type is the pi type object so Pi type object would have something Pi type object is a struct of underscore type object I click on that I could see something over here I could see a const cat Star TP underscore name TP underscore name looks like the name of the type it needs also written for printing in format module dot name so this has to be the name of the type so I'll take that and type TP underscore name so this is what is giving me the name of the type of the object which we are getting which we are Printing and another thing is OB unders or OB Arrow OB underscore ref count or something it would not autocomplete because it's uh there is a syntax error we'll sorted that's fine so we want to print that stuff now when we are printing the print function we cannot just write printf over here what we are doing is we are writing we have to adhere to the way other things are written because we if we write printf it would be printing to St to uh to it will be printing to STD out but what if the user passed in some files we want to write it to this file so we cannot use raw printf we know the function that python is using to print it it's Pi file write object so I will use the same function to write but what do you want to write object is anyway getting printed over here before that we want to print an angular thing so which means we want to print the metadata now when we want to print the metadata it has to be a string object but we'll write everything as Pi or we'll write metadata is equal to string in Python is a Unicode so all the functions of string are Pi unicode but we want to construct Unicode from something we'll go through the functions that exist append no we don't want to append to this as this as this there are a lot of functions you want to create Pi Unicode from something there has to be a function from something so Pi Unicode from encoded object from format from object from ordinal from string from format this looks this supposed to look like from a format from a string format so let me click that go through this and see what this function does it does take a const cat start format and then it prints it in that specific format this is the one that we would need colon oh sorry chords we passed in angular brackets percentage s because we want to print the type name first colon reference count followed by a space and I would pass in the OB underscore PP name and OB underscore ref count let it autocomplete but let's go ref C and T right okay so we created object so Pi Unicode from format it Returns what pi record slash format returns Pi object start so we'll use this Pi object Star as a return type over here making it very close to the one that we wrote over there and then we have to inspire right object does this we copy paste the same error condition check that we did over here like this right so we didn't know the internals of python we just looked at the code around us to figure out what we could do right and then we just look for it right now we could see priority code form a returning Pi object metadata over here we are first writing the metadata which would print angular bracket percentages coolant percentage D followed by a space then we write the actual object now to just ensure that our understanding is correct we'll build the binary again and then we print it right and then we print and see if the thing that we intend is really getting printed or not fine so the build will happen will the build will complete in a few seconds it's about to complete now what we'll do is complete complete Imports says this is typically the final step that would happen finally I'll open the shell I'll let me create a dictionary d which contains string which contains a key a and value one right and now what I want to do is I want to print this dictionary print p Bingo what do we see we see angular brackets dictionary colon 3 space and the dictionary string version of it right so this clearly see so shows the print function that we actually modified is the one that is getting invoked and we are now printing the type of the object along with the reference counter because these are the two only two things we could find in Python like in the pi object struct that we had right but now we see that the pp name is getting printed and the reference count is getting printed right now we can pass in anything anything to this let's say I do print off it print Str colon minus one I'm not sure what minus one is but the type is string that makes sense and that is getting printed beyond that and minus one it's a funny thing we'll figure it out over time What minus 1 stands for well because this means that the reference count is minus one not sure what minus one is we'll sort it right okay but in general we do an over reference count is my reference count is the number of references or the number of places from which this variable is getting referenced it is a very famous garbage collection technique like reference counting based garbage collection so if our thing is true that this is what ref count is storing so if I create two lists L1 and L2 I know that for my Dictionary d my left count is 3 right which means that from three places this is getting referenced I don't know which three because we have just initialized that thing internally maybe python would be using to reference it somewhere we don't know but let's say it's 3 so if I do an L1 dot append of d the ref count should increase became 4 because now it is also getting referenced in the list L1 now if I do L2 dot append of D and I do print of d left count became five because now therefore 5 less is L one l two and the three default right we saw like the reference count is indeed the reference count used for garbage collection so now by following where the ref count is changing and all we can actually understand how how garbage collection Works in Python right that's the beauty of it when a language is open source you can do so much with it you can understand so many intricate details of it and have fun around right so yeah we touched up on quite a few things in this one right so just to summarize on what we did is we found a way to like we saw how we can use documentation to figure out where the source code is we did that then we made modifications we understood how python Print Works we are like how python print like sorry what python print does then we followed duck source code to understand how it actually does that then we modify the source code to add the meta information with that in the meta information we figured out there is some called type and some uh ref count we saw the type error name gave us the name of the type which we used in printing ref count gave us the number of references this object has from other places and we also just for the sake of it we added things or we added that same object the dictionary object at Multiple List to see reference reference count getting increased right but we are not touching upon garbage question in this one but you get the idea right it's a very simple changes that we made but it helped us uncover garbage collection bit of it now we can trace where garbage description happen we've got comfortable in understanding that what pi objects are how to create it how we are printing it how the flow is structured and yeah that's fascinating this is a sign of a very good open source code base that even for a beginner like us who have no idea on what this thing is you can still buy your intuition by just following code that exist around you you can write and make good enough changes to an existing code base right and yeah highly recommend you to do this highly recommend you to do this Hands-On on your machine it's really easy to do it and yeah that's all what I wanted to cover this was the third video in the C python Eternal Series where I'm trying to cover or how easy it is to make changes to gigantic open source code bases and in the process learn a few internals about python so yeah I hope you found it interesting hope you found it amazing that's it for this one I'll see in the next one thanks again [Music]

Here's the video ⤵

Courses I teach

Alongside my daily work, I also teach some highly practical courses, with a no-fluff no-nonsense approach, that are designed to spark engineering curiosity and help you ace your career.


System Design Masterclass

A no-fluff masterclass that helps SDE-2, SDE-3, and above form the right intuition to design and implement highly scalable, fault-tolerant, extensible, and available systems.


Details →

System Design for Beginners

An in-depth and self-paced course for absolute beginners to become great at designing and implementing scalable, available, and extensible systems.


Details →

Redis Internals

A self-paced and hands-on course covering Redis internals - data structures, algorithms, and some core features by re-implementing them in Go.


Details →


Writings and Learnings

Knowledge Base

Bookshelf

Papershelf


Arpit's Newsletter read by 100,000 engineers

Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.