- PyMySQL — Connecting Python and SQL for Data Science
- Easily access MySQL databases and execute SQL queries in Python
- Table of Contents
- (1) What is PyMySQL, and Why Use It?
- (2) Initial Setup
- (i) Installation
- Introduction to SQL in Python for Data Scientists
- The data scientist’s guide for using SQL in Python environment.
- First Things First! What is SQL?
- How is “SQL” pronounced?
- What Does a Relational Database Look Like?
PyMySQL — Connecting Python and SQL for Data Science
Easily access MySQL databases and execute SQL queries in Python
SQL and Python are indispensable tools for data practitioners to work effectively with data.
A common use case would be the initial retrieval of data from relational databases using SQL queries, followed by subsequent manipulation and analysis of the data in Python with libraries such as pandas.
But did you know that these two seemingly isolated tasks can be combined into a single Python script to easily deliver the same outcome?
In this article, we discover the concepts and implementation of PyMySQL for connecting to and interacting with MySQL databases from Python.
Table of Contents
(1) What is PyMySQL, and Why Use It?
PyMySQL is a pure-Python MySQL client library, which means it is a Python package that creates an API interface for us to access MySQL relational databases.
The documentation page states that PyMySQL was built based on PEP 249. It means PyMySQL was developed based on the Python Database API Specification, which was set to promote standardization of Python modules used for database access.
The key reason for using PyMySQL is that it serves as a handy interface to interact directly with MySQL databases by incorporating SQL statements within the confines of Python scripts.
It also means we do not require access to a separate RDBMS interface for running SQL queries needed for our downstream Python operations.
(2) Initial Setup
(i) Installation
We can install PyMySQL using pip:
Introduction to SQL in Python for Data Scientists
The data scientist’s guide for using SQL in Python environment.
This article provides an overview of the basic SQL statements for data scientists, and explains how a SQL engine can be instantiated in Python and used for querying data from a database.
As a data scientist using Python, you often need to get your data from a relational database that is hosted either on your local server, or on the cloud (e.g. AWS cloud). There are many ways to approach this. For example, you can query your data in Oracle, save the file as a .csv file, and then import it in Python. However, the most efficient way it to use SQL directly in Python. Coupling SQL and Pandas would give you many options to query, process, and use the data for your project in Python.
First Things First! What is SQL?
SQL (aka Structured Query Language) is a programming language used for managing or querying data stored in a relational database management system (RDBMS). SQL has been the dominant language for handling structured data where the entities in the database (e.g. tables, or table entities) are related (that is why these databases are called relational databases). There are other options for handling such data, but SQL has been the most popular, widely used language in the industry.
How is “SQL” pronounced?
SQL was developed at IBM in the early 1970s, and it was originally called “SEQUEL (Structured English Query Language)”. Later on, the name was changed to SQL (Structured Query Language) due to a trademark issue. However, the pronunciation “see-qu-el” (/ˈsiːkwəl/ ) stayed with the language, and that is the adopted pronunciation by most practitioners.
[Pro tip: when you go to an interview, make sure you pronounce it “see-qu-el”, if you want the job!]What Does a Relational Database Look Like?
A relational database is a collection of data items with pre-defined relationships between them. These items are organized as a set of tables with columns and rows. Tables are used…