Using SQL for Data Science

Why SQL?

How is SQL (pronounced see–kwell) relevant to you as an aspiring data scientist? And what is SQL anyway?

SQL, or Structured Query Language, is a special-purpose programming language for managing data held in relational database management systems. Almost all structured data is stored in such databases, so, if you want to play with data, chances are you’ll want to know some SQL.

Some of what you can do with SQL—data insertion, queries, updating and deleting, schema creation and modification, and data access control—you can also accomplish with R, Python, or even Excel, but writing your own SQL code is more efficient and can yield more easily reproducible scripts. Learning SQL can also give you a leg up on folks transitioning, often with limited database experience, into the data science industry from academia.

Six Handy SQL Features for Data Scientists

Here, adapted from a list on the Yhat blog, are “some awesome things you can do with SQL.”

  • Generate queries from a query: Basic string concatenation makes it easy to generate en masse queries that use data in a database to fetch data found in another system.
  • Handle dates: “Fantastic date functions” exist to meet all your formatting and type conversion needs.
  • Text mining: Yhat recommends going as far as you can with SQL’s built-in string functions before turning to a scripting language.
  • Find the median: Since there’s no built-in aggregate function for median, Yhat provides the code.
  • Load data into your database with the \COPY command.
  • Generate sequences: Use the generate_series function to create ranges of dates and times and to handle time series and funnels.

Interested in a different career? Check out our other bootcamp guides below:

Learn SQL Online

Resources for learning SQL abound, ranging from text-based to interactive and providing entry points for learners with varying levels of programming and database experience. Here’s a sampling of the online offerings:

  • Intro to SQL: Querying and managing data from Khan Academy covers SQL basics, more advanced SQL queries, relational queries in SQL, and modifying databases with SQL.
  • SQL School from Mode Analytics is an interactive tutorial geared toward aspiring analysts with no SQL experience.
  • SQLCourse.com allows you to practice what you learn in its tutorial using an online SQL interpreter.
  • SQL Tutorial from Tutorials Point gives beginners a quick start with SQL.
  • Learn SQL the Hard Way is written for people with no prior database, programming, or SQL knowledge, and promises to teach you “80% of SQL you probably need to use it effectively.”

Happy querying!

Last updated: June 2020