From SQL to Pandas: Your Translation Guide!

5 Common SQL Queries Translated to Pandas Functions.

Ashley Biddle
3 min readApr 13, 2023
Photo by Romain Vignes on Unsplash

While SQL is the go-to language for querying data in relational databases and Pandas is a popular Python library for data manipulation, both SQL and Pandas are similar in many ways. A basic understanding of SQL can make it easier to learn Pandas. Use your SQL knowledge to learn data manipulation in Python with Pandas.

1. Selecting columns

The first query we will look at is selecting columns. In SQL, we use the SELECT keyword to select specific columns from a table. For example, to select the column “name” and “age” from the table “people”, we would use the following SQL query:

SELECT name, age FROM people;

In Pandas, we can select columns using the square bracket notation. For example, to select the same columns from a Pandas DataFrame, we would use the following code:

df[[‘name’, ‘age’]]

2. Filtering rows based on a condition

The next query we will look at is filtering rows based on a condition. In SQL, we use the WHERE keyword to filter rows based on a condition. For example, to select all rows from the table “people” where the age is greater than 30, we would use the following SQL query:

SELECT * FROM people WHERE age > 30;

In Pandas, we can filter rows based on a condition using boolean indexing. For example, to select all rows from a Pandas DataFrame where the age is greater than 30, we would use the following code:

df[df[‘age’] > 30]

3. Grouping and aggregating data

The third query we will look at is grouping and aggregating data. In SQL, we use the GROUP BY keyword to group data by one or more columns and the aggregate functions to perform calculations on the grouped data. For example, to count the number of people with the same age in the table “people”, we would use the following SQL query:

SELECT age, COUNT(*) FROM people GROUP BY age;

In Pandas, we can group data using the groupby() function and perform calculations using the aggregate() function. For example, to count the number of people with the same age in a Pandas DataFrame, we would use the following code:

df.groupby(‘age’)[‘age’].count()

4. Joining tables

The fourth query we will look at is joining tables. In SQL, we use the JOIN keyword to combine data from two or more tables based on a common column. For example, to join the “people” and “addresses” tables on the “id” column, we would use the following SQL query:

SELECT * FROM people JOIN addresses ON people.id = addresses.id;

In Pandas, we can join tables using the merge() function. For example, to join two Pandas DataFrames on the “id” column, we would use the following code:

pd.merge(people, addresses, on=’id’)

5. Sorting data

The final query we will look at is sorting data. In SQL, we use the ORDER BY keyword to sort data based on one or more columns. For example, to sort the “people” table by age in descending order, we would use the following SQL query:

SELECT * FROM people ORDER BY age DESC;

In Pandas, we can sort data using the sort_values() function. For example, to sort a Pandas DataFrame by age in descending order, we would use the following code:

df.sort_values(‘age’, ascending=False)

Summary

SQL and Pandas are both powerful tools for working with data, and a basic understanding of SQL can make it easier to learn Pandas. By understanding the five common SQL queries and their equivalent Pandas functions, you can easily translate between the two languages and perform data analysis tasks efficiently.

Want to Learn More?

Click here to learn advanced SQL techniques that will take your queries to the next level!

--

--

Ashley Biddle

Ashley is a highly skilled Data Arcitect with a passion for building and maintaining complex data ecosystems.