Arpit's Newsletter read by 56000+ engineers
Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.
Write an engineering article with a concrete timeline in Markdown format from the following transcript.
open source code bases are overwhelming especially the massive ones like C python the best way to build an understanding of it is by making simple changes in this video we change the Python language and add a new statement nuke this statement would apparately kill the runtime it's the most simple yet significant enough change that we can make in the Python language the intention of this video is to make you all understand how easy it is to make simple changes to massive code bases and contribute to open source this is the second video of the series so I would recommend you to check the other ones as well let's jump right into it now here what I have is I have set up C python locally where if I do dot slash python it would start the python shell it would start the python runtime from the code from the source code which I have locally right now here when I type in new so first for example if I just want to uh stop the runtime I can just pass in exit bracket like I'm invoking the exit function call which would just stop my python runtime and I can do an Eco dollar question mark which basically returns Me 0 which is an exit status code now what we are trying to do this was a graceful shutdown what we want to do is we want to do nuke when we do nuke we want to just kill the runtime immediately this is a simple change that we would make but we have to go through a lot of files to understand what we are trying to do over here right now when I do new we have to kill the runtime and I'll use an exit code 45 because in the year 1945 the atomic the the atom bomb were dropped so I'll use an exit status code of 45 over there okay so let's see how how we should approach making these changes rather than like directly me saying that hey these are the changes that you would need to make let's see let's understand the thought process behind it so what are we trying to do we are trying to add a new statement called nuke nuke does not take any argument right so let's find out which other python statement exists that is just just is just a one word statement not a function call and it does not take any argument one statement that came to my mind when I was thinking about is is break because when we are in a for Loop and we write break it breaks out of the for Loop right so which means that the statement that we would want to implement would be very similar to how break is implemented right so what we would do is we would start adding wherever we see break happening you just add our code next to it called nuke right let's see how it goes first of all what we have to do is we have to add support for the nuke word in my Python language so which means and obviously every language has a grammar this language also has a grammar you can find it in a DOT gram file in grammar folder and this is the peg grammar which is written for this language even I don't know or think of it right but for us to make simple things you don't need to that's the beauty of it so long as you are able to narrow down the changes where you need to make good enough right so what do you want to do we want to do where the break is we would add our statement next to that right and while obviously are just currently adding it we'd have to go through few lines up and few lines below to understand that hey are we even adding it at the good place or not so let's find where the break is implemented we could see in line number 123 I could see a break and a continue and I could also see a pass oh yeah I forgot about pass passes also a statement that we pass having no arguments break is their passes that continues there return and then some argument is there but will not go into that break continue this is exactly what we want so what we'll do copy paste right and instead of continue I'll add mute to that right and here it is pi SD card I don't know what is a basically sorry I don't know what underscore Pi AST continue does I know AC stands for abstract syntax tree even if you don't know that's fine but you can very clearly see that break has pi AST break continuous pie SP continue so nuke would have Pi AST nuke right so wherever this 5 St break is implemented do a depth first search wherever this pipe AST break is implemented you go and Implement that right and then whatever the known is you implement that over there so I look for pi St break everywhere in my code base Ctrl shift f is your best friend over here so I look for pi St brick we found this where we make the nuke changes number one second place where we found is over here so I'll just do a copy paste I could see a lot of function definitions written over here I would just add that pie as the nuke and do I need to change argument every function looks like having the same set of arguments I'll keep it as is then the next chain is in parser.c this file is where we typically know that like we should know that this file you could see a pi AST break added over here but this file is a gigantic file and this is a parser typically what happens is in a language you don't change the like you don't change the passes the parser is generated like Lex and Yak that you folks might know that is how a parser is generated for that language so we Define it in the grammar using this grammar this parser file is generated so you will not make any changes over here for anything related to break we don't have to right so we'll not touch the parser.c file we would Auto generate it in couple of minutes right okay then another place where we could find is python asd.c where I could see that because we declared a function Pi AST underscore new this is where the break is so this is what the definition of the function is so now I do a copy paste and I do a pi SD of nuke I go through the function body to see what are they doing is there anything specific about break over here I could see statement something malloc something break kind oh break is over here let me change it to new kind I don't know what this is doing just making those changes because looks like this is how something it would be finding out what kind of statement are we passing over there I'll just put it after continue just for the cleanliness of it right so I added Pi St new and I added this new variable now that first search I added this new thing new kind which is similar to break kind so let me go through that and search for break b r e k kind and a case sensitive search and I find this five six references I'll go through all of them and make changes to see if I would require any changes here I could see pass kind break kind see pass kind break kind continue kind I'll add my own kind which is this new kind another 2627 either 28 to this first change done second is over here here we are adding we are having a pass kind break kind continue kind but instead of just bluntly adding A New Kind let's see what this function does because you cannot just bluntly add it everywhere because break has a different use case nuke will have a different use case you need to understand where you are adding so we'll just scroll up and see what it is doing if it looks like really long we can just skip it but we can see some calling is done and ASP fold statement we still don't understand the thing what what we are trying to do but it looks like some calls have been made over here if kind with kind and then you are invoking something that's okay but here we see that pass didn't do anything break didn't do anything continued into anything so we're at new it will also not do anything right we'll see we'll trial and test and see what happens then the next file is ast.c here you find break kind so you find continue kind let me add a case called new kind but here what we just changed we added this red equal to 1 was first part of this continue kind but now we are a new kind after continue kind let's see what this is is actually doing because we cannot just what if this new what if this return one had some significance so let's see what this function does if you look careful if you just scroll a couple of lines up you could see a lot of validations happening non-empty sequence something import has some validations happening non-empty sequence and all so this looks like a function that is taking care of validation for each kind of statement but if you look carefully because we are not accepting any argument neither pass nor break nor continue when because we are not accepting any argument it's okay to just return one which would mean that it's an okay statement because here we see return less than zero an expected statement and something around that so this looks like a good change for us right okay the next thing is this compile.c file which had this break kind and it is compiler dot break so we'll do copy paste after continue where the break kind is I'll change it to new kind and in the compile underscore break I'll do compiler underscore nuke which means now we have to implement this function because it's a function call this function would be implemented somewhere right next to it we would Implement a nuke function we'll come back to that right then let's come to this next file python ASD here you see pi ast.brick I'll do this is where we implemented that part it stumbles we stumbled upon it again because we this is this Pi SD dot break which contains break kind and we changed it to Pi St dot nuke to new kind we added that block remember this is where we started and we circled back right so this change is taken care of now the next change where the break can't exists is this this is I don't know what this does but looks like some generic object it is creating copy paste I change this break kind to new kind and now what we see is some state is equal to break time oh looks like we have to add a new type we added new kind I don't know what that is and you don't really need to know to add a new support to that over time you would build that Intuition or overtime you build that uh skill set to do it even I don't know what that is so I'll do nuke type and now the next thing is we have to find references of break type and see where drill you we need to change but let's first close everything around break kind so we made this change we added a new new kind and the new type over here now we go to the symbol table and we see do we need to change anything continue kind do nothing over here looks like for past we are doing nothing for Brick and you are doing nothing for continue kind we are doing nothing nothing to do over here we'll add that same thing for new kind nothing to do over here right because we are not doing because this looks like where we are visiting in the tree and all and all let's just change and see if it works or not right because it's okay to make mistakes the thing that we added in the past uh in the previous file was the new type which was similar to break type now let's see the references of break type right and make changes there so I see break type preference over here I copy paste and what we want to do is we want to call new and I can see that this file is arranged lexicographically or alphabetically so I'll go to n and add my statement not equal to n o and I'll add new type to it right then I see this another object uh so this file is done now we come to python ASD where I could see Pi clear break type I don't know what this does but looks like every single type has some entry over here so I'll add one entry over here now see now you can see how even if I don't know I don't need to know what each one of this is doing I can just make referential changes everywhere and just sort it out for mine right so I've added Pi clear now here I could find another brick type where it is doing State Arrow break type something State break type and then there is continue I'll copy paste this two line and add it after continue like we have done everywhere I'll add nuke over here nuke over here nuke over here and nuke over here even I don't know what I'm doing but let's try right we'll make first all the changes we'll run and see what happens right I made changes over here break type nuke type this change is done we come over here we see new type new kind and this is where it all started we added new type it was already added because this is where we discarded we have got something called this new type we added that we added the stick now this is another change where we see break type now for this break type I could see I would like I'm not able to understand what this is so let's go a few lines up and few lines down to see what this is actually doing so I could see break type starting over here some instance checking some checking is instance then it is this is where the function is getting invoked right that break that Pi St break Pi St nuke we added this is where it is getting invoked so let me just do a blunt copy paste of it when we see break implemented over here continue implement it over here and it's the end of it so but before we add it let's look at the next line it says error expected some sort of statement but got percentage I don't know what that is but this tells us that this is where we have to make an entry because if you are not adding it it would not identify the new type of it it might throw an error so let me add nuke type over here is instance and we added Pi AST new cover here this is where the pi SD nuke that we defined over there is getting invoked over here looks like pieces are coming together right and then the this change is done the last change we see over here is break is added over here I'll just go to continue and add my object reference we added this type I add my new type over here and I change it to Nuke right we made a bunch of changes right one thing that we have to remember that compiler underscore break that we just saw we have to add compiler underscore nuke somewhere right this was that file where again DFS we went through that which made all the changes we came back we visited or we recalled hey this was one chain that was remaining so let me just do this compile this compiler underscore nuke right so but we don't know where to implement so we'll take help of compiler underscore break and we figure out two references of it I could see a function I'll do just blend copy paste I'll put it compiler continue I'll put it next to it because we have always been putting break continue and then nuke so I just do compiler underscore nuke it takes some compiler object and some location object I have no idea but that's okay right and this should be the function call that is happening I don't know what all of this does but what we want to do is we will just want to kill everything foreign right we just want to kill it so I'll what I'll do to kill a c process what we'd have is we have exit code we write exit right and in exit we can pass an exit code so here I'll pass 45 right so we completed our DFS we made all the changes for break kind sorry we went through all the changes of break kind added new kind next to that we went through all the channels of break type add new type next to that we added the functions wherever we could find a reference of it and we sorted that out right now looks like we have done it all I'm not sure but let's try to build it now for us to build we fire make command but one thing to know is whenever we change any grammar a grammar's parser is automatically generated remember we made changes over there we did not change anything in parser.c file and I said it is auto generated that's how your grammar's parser is generated so in Python for us to generate python grammar or rather the parser for python grammar whenever we make any change in the grammar what we have to do is we have to run regen hyphen p-e-g-e-n this I found by doing Google by on Google search I search hey how to change C python grammar and I found out that whenever you change anything in dot gram file you have to run this command when I run this command this generates the parser for the new statement or the new grammar that I've made changes to right so this parser is what it was showing in the result that this is where you need to change this password is auto center so we don't have to change anything there so if I just open this file parser Dot C over here here I should see nuke statements and see I am seeing the nuke statements this is nuke token nuke and I could see anywhere else I could see new nuke Pi St new getting invoked over here see this file was odd I we never made changes over here right but parser is picking that up and making and generating this file for us right so we did this now we generated regenerated grammar regenerated a parser of our language and now we will run make because we made changes to a lot of core files so whenever we run make a good open source project typically has a very sophisticated make rules so depending on the files that you change it would only do partial compilation of those files and just link the existing one so the files that you didn't change would not be compiled again the shared object files are created and only the linking happens at the runtime now because we change some very foundational files grammar everyone uses it those uh those ASD files everyone uses it so the building process will take time typically three to four minutes right so let's let's wait for this to complete once that's once this completes we will try to execute and see if our changes work or not so now the compilation is complete it took roughly six minutes to wrap this up now that the comparison comparison is complete the binary is created there is no syntax error nothing which means all the changes we did at least it was it was uh it was complete in the sense that a binary got built now let's see by running our own interpreter and see if like if it has the changes or not I ran and let me just do an exit to see if existing function didn't break it did not it works just fine now let me type in Nuke hit and the program abruptly terminated what did we write in the exit code 45 so if I do Eco dollar question mark I got 45 and this is how you make changes to a massive code base without knowing a thing about it right just basic things here and there just figure out how like just try to do this like apply first principles and figure out what's the most similar thing to this and then just start making those changes alongside that right so when what we did we knew we had to add a new command called nuke we found a similar command called break or rather having same syntax and then we started changing everywhere the break existed right right next to it that's exactly what we did right and but this gives us a very nice idea on what all things are there although we didn't understand what AST does how it stores stock but at least it made us more familiar with this code visit there is some AST file and some tree structure created and some places there is some kind and some type and this and that and that's how you evolve in an open source code base as massive as this smaller open source code bases are fine such massive code bases have lot of things already abstracted so this just helps you navigate through that and yeah I hope you found it interesting hope you found it amusing this is that one series I'm really pumped about to go deeper into C python internals and help you all understand how to navigate through massive code basis and make some changes here and there and understand how things work while while helping you understand how to contribute to open source code basis I know that's it for this one I'll see in the next one thanks [Music]
Here's the video ⤵
Super practical courses, with a no-nonsense approach, are designed to spark engineering curiosity and help you ace your career.
Arpit's Newsletter read by 56000+ engineers
Weekly essays on real-world system design, distributed systems, or a deep dive into some super-clever algorithm.