Published on 17th Jun 2021
2 min read
January 2022 enrollments are closed and the course commences on 8th of January, 2022. For future cohorts Join the waitlist →
The only way to infinitely scale your system is by making it distributed, which means adding more servers to serve your requests, more nodes to perform computations in parallel, and more nodes to store your partitioned data. But while building such a complex system, we tend to assume a few things to be true, which, in reality, are definitely not true.
These mistaken beliefs were documented by L Peter Deutsch and others at Sun Microsystems, and it describes a set of false assumptions that programmers new to distributed applications invariably make.
No. The network is not reliable. There are packet drops, connection interruptions, and data corruptions when they are transferred over the wire. In addition, there are network outages, router restarts, and switch failures to make the matter worse. Such an unreliable network has to be considered while designing a robust Distributed System.
Network latency is real, and we should not assume that everything happens instantaneously. For every 10 meters of fiber optic wire, we add 3 nanoseconds to the network latency. Now imagine your data moving across the transatlantic communications cable. This is why we keep components closer wherever possible and have to handle out-of-order messages.
The bandwidth is not infinite; neither of your machine, or the server, or the wire over which the communication is happening. Hence we should always measure the number of packets (bytes) of data transferred in and out of your systems. When unregulated, this results in a massive bottleneck, and if untracked, it becomes near impossible to spot them.
We put our system in a terrible shape when we assume that the data flowing across the network is secure. Many malicious users are constantly trying to sniff every packet over the wire and de-code what is being communicated. So, ensure that your data is encrypted when at rest and also in transit.
Network topology changes due to software or hardware failures. When the topology changes, you might see a sudden deviation in latency and packet transfer times. So, these metrics need to be monitored for any anomalous behavior, and our systems would be ready to embrace this change.
There is one internet, and everyone is competing for the same resources (optic cables and other communication channels). So, when building a super-critical Distributed system, you need to know which path your packets are following to avoid high-traffic competing and congested areas.
There is a hidden cost of hardware, software, and maintenance that we all bear when using a distributed system. For example, if we use a public cloud-like AWS, then the data transfer cost is real. This cost looks near zero from a bird's eye view, but it becomes significant when operating at scale.
The network is not homogeneous, and your packets travel to all sorts of communication channels like optic cables, 4G bands, 3G bands, and even 2G bands before reaching the user's device. This is also true when the packets move within your VPC through different types of connecting wires and network cards. When there is a lot of heterogeneity in the network, it becomes harder to find the bottleneck; hence having a setup that gives us enough transparency is the key to a good Distributed System design.