Learning SQL with Stanford's Online Course
Update Apr 20th: The course has recently been made available again on edx.org.
SQL was developed in the 1970s, but is still a workhorse in many modern data analysis pipelines. There are countless SQL introductions and tutorials available on the internet, but it took me a while to find the one that works best for me.
During the process of broadening my programming skills beyond R, I went through a number of SQL online courses. I managed to learn the basics relatively quickly, but realized that many tutorials don’t spend much time on the basics because, well, the 3-5 key commands can be explained relatively quickly.
I went through the SQL tutorial by W3School and Kaggle’s BigQuery course but I was not very satisfied with either. W3School tended to have some rather simple “fill in the blank” examples that did not really check actual skills. The Kaggle course was at the opposite end of the spectrum: Its example databases were sometimes explained only in a superficial manner and were so complicated and large that it was difficult to verify one’s solution to the exercises before submitting the answer (and to find one’s error if the answer was wrong).
However, in the process of going through these courses, I discovered a recommendation for another SQL course offered by Stanford University thanks to some praise in a “Towards Data Science” article. Stanford’s SQL mini-course is by far the best online resource that I could find on the topic. It consists of nine videos (5min-25min each) where Prof. Jennifer Widom first provides some background on the video topic and then explores the topic more deeply in live coding examples. She often demonstrates how to re-write a certain code in another manner, which is an excellent illustration of the SQL features. The course certainly does not cover every SQL detail (for instance, no common table expressions), but it covers the basics exceptionally well. Moreover, the level of difficulty is perfect in my opinion: The database that is used throughout the course is small and straightforward to understand, but the questions are tricky and force the learner to really think about the basic concepts. (In other words: It’s easy to verify solutions and the main challenge is the proper use of SQL, not the particularity of the database.)
Stanford used to offer the course via its “Lagunita” self-learning platform which was however retired in March 2020. To ensure that some key materials from the course remain available, I have recently created a Github repository with links to the main course content. The course videos were actually hosted on YouTube from the beginning; the links can be found in the repository. Moreover, the repository provides a copy of the course exercises and their solutions (in two versions: once including the SQL code and once only with the result set but without the SQL code to help learners verify their answers without seeing the actual code leading to the solution).
I have recently added my solutions to the last exercises, so the course repository contains now all exercises and solutions. If you want to learn SQL, I highly recommend checking out the course material.