Do you love watching stand up comedy? Or maybe you like watching cute cat videos? Or even the latest Bollywood songs and trailers? Whatever your interests, you use YouTube to watch videos. Also, you might even have a popular channel on YouTube. In either case, YouTube is an integral part of your life. This is easily proved by the fact that more than 400 hours of video content is uploaded on YouTube every minute, and approximately 1 billion hours of YouTube videos are watched every day. This makes YouTube the 2nd most popular social media platform in the world with 1.9 billion use now.
This is an insane amount of data that is stored and managed on YouTube. So the natural question is “How do they do it?” How does YouTube store and retrieve their content? How do they know which video to recommend to you next? How do they know what you want to watch? The answer to these questions lies in the complicated database management systems for YouTube. So let’s try to understand that now.
YouTube is the goto platform for watching and sharing videos. So, it’s obvious that there is a large volume of video content that it has to manage daily. This is done by using MySQL and various database management systems at different places to keep YouTube up and end-users.
Most of the YouTube data is stored in the Google Modular Data Centres. A modular data centre is portable and can be placed wherever the data storage capacity is required. Since YouTube was bought by Google in 2006, it stands to reason that the YouTube data is stored in the Google Modular Data Centres. There are mainly 5 or 6 Google data centres that YouTube uses along with its own content distribution network (CDN) to make sure data is constantly available to end-users.
The more popular videos are moved to CDN which replicates them into various places. This means that they can be accessed much faster by the user with fewer hops required. On the other hand, less popular videos are saved on the YouTube servers where they can be accessed on-demand. Also, there is no hard and fast rule that the videos are stored in the data center closest to the geographical region they came out of. For example: If you upload some videos on YouTube from India, your data may be stored in a data centre in the UK. Youtube also makes use of cloud storage in addition to all these methods.
Originally MySQL was mostly used in the YouTube databases to store most of the data ranging from the videos to metadata like users, tags, and descriptions. The varbinary data type was used for the databases which allowed the storage of videos and images like thumbnails as well! However, a disadvantage of MySQL is that there is little scope for scalability, which is a very important factor in an ever-expanding company like YouTube. However, YouTube cannot let go of MySQL completely, so Vitess is used in conjugation with MySQL. Vitess is a database clustering system that combines many of the important features of MySQL with the scalability that is a trademark of a NoSQL database. Vitess helps in consolidating the YouTube queries into smaller batches that are much easier to handle and execute. It also creates backups and scales as much is required.