I am building a realtime notification system as my final semester project. The end product is a simple API that can be used by third party to push real time notification to client applications running on multiple platforms that include browsers and devices. Ideally, if your device can connect to the internet, it should be able to receive notifications. The non functional system requirements are pretty ambitious with scalability, availability and latency guarantees being the most important. Maybe little too ambitious for someone like me but then I will learn a lot in this process and its more fun when its challenging.
To address the scalability issues, we’ll have to distribute the system. A single system will not scale beyond a point and we need to be able to distribute the load over nodes effectively to be able to serve clients while maintaining the quality of service guarantees. With distribution comes a whole lot of other problems like latency, coordination and handling failures. We have a theorem that officially states that you cannot have the pie and eat it too; its called the CAP theorem. It says that you cannot have have a system that is Consistent, Available and Partition tolerant at the same time (Partition tolerance here refers to the ability of a system to continue operating in the event of partial failures). You can have any two but not all three in varying degrees. In a way this is good because you do not go chasing the holy grail because you know it doesn’t exist. CAP Twelve Years Later: How the “Rules” Have Changed is a good read for some sensible advice when applying CAP theorem to real world scenarios. Reading Notes on Distributed Systems for Young Bloods by Jeff Hodges is highly recommended for anyone starting to work on distributed systems. All the commons assumptions that you may have when you are building a distributed system is wrong, so wrong that it is sort of standardized now under the title Fallacies of Distributed Computing. A more detailed article is available here. I will be using concepts as given in the Amazon Dynamo paper to make the system highly available. This one is big and will a get a blog post of its own. The notification service that I am building at its core is a Publish/Subscribe system with soft real time guarantees. Right now I am clueless about how to deal with latency guarantees but I hope to figure it out soon. Websockets will be the application layer protocol that’ll be used to push to browsers and for the trasport layer protocol I hear the MQTT protocol fits to a T. More on this later.
I will updating my blog as the project progresses, documenting all the design decisions and the rationale behind them. The code will be available online. This is not something new and I am aware of tools that can do this. While I will try to improve on what already exists, the point is, as mentioned earlier, to learn something new and build what will be considered good software. I have never attended a distributed systems theory class, so I am reading up on that as well. Vector clocks are really cool, if you ask me.
If you think this is too big, to be accomplished by a single student programmer in a semester’s worth of effort, I wouldn’t deny that but then when you pick the right tools for a job, things become a lot more easier and you can afford to be ambitious. Erlang for the win! Feel free to comment on what I am building as I may be potentially unaware of a lot of things I should be knowing. I could use all your suggestions/advice to make things better.