SQL Part 3: Primary Key, JOIN, GROUP BY, ORDER BY
Use these sqlite3 commands to format your output readably.
At the start of your session, issue the following two commands — these will format your sqlite3 output so it is clearer, and add columns headers.
sqlite> .mode column sqlite> .headers on
Now output is clearly lined up with column heads displayed:
sqlite> SELECT * FROM revenue; # company state price # ---------- ---------- ---------- # Haddad's PA 239.5 # Westfield NJ 53.9 # The Store NJ 211.5 # Hipster's NY 11.98 # Dothraki F NY 5.98 # Awful's PA 23.95 # The Clothi NY 115.2
However, note that each column is only 10 characters wide. It is possible to change these widths although not usually necessary.
Primary Key (PK) in a Table
The PK is defined in the CREATE TABLE definition.
A primary key is a unique value that can be used to identify a table row. These are usually id numbers, such as a product id, county id, domain ip, etc. Since it identifies a unique row in a table, the primary key in a database cannot be duplicated — an error will occur if this is attempted.
Here’s a table description in SQLite for a table that has a «instructor_id» primary key:
CREATE TABLE instructors ( instructor_id INT PRIMARY KEY, password TEXT, first_name TEXT, last_name TEXT );
JOINing Tables on a Primary Key
Two tables may have info keyed to the same primary key — these can be joined into one table.
Relational database designs attempt to separate data into individual tables, in order to avoid repetition. For example, consider one table that holds data for instructors at a school (in which one instructor appears per row) and another that holds records of a instructor’s teaching a class (in which the same instructor may appear in multiple rows).
Here is a CREATE TABLE description for tables instructors and instructor_classes . instructors contains:
sqlite3> .schema instructors CREATE TABLE instructors ( instructor_id INT PRIMARY KEY, password TEXT, first_name TEXT, last_name TEXT ); sqlite3> .schema instructor_classes CREATE TABLE instructor_classes ( instructor_id INT, class_name TEXT, day TEXT );
Select all rows from both tables:
sqlite3> SELECT * from instructors instructor_id password first_name last_name ------------- ---------- ---------- ---------- 1 pass1 David Blaikie 2 pass2 Joe Wilson 3 xxyx Jenny Warner 4 yyyy Xavier Yellen sqlite> SELECT * from instructor_classes instructor_id class_name day ------------- ------------ ---------- 1 Intro Python Thursday 1 Advanced Pyt Monday 2 php Monday 2 js Tuesday 3 sql Wednesday 3 mongodb Thursday 99 Golang Saturday
Why is instructor_classes data separated from instructors data? If we combined all of this data into one table, there would be repetition — we’d see the instructor’s name repeated on all the rows that indicate the instructor’s class assignments. So it makes sense to separate the data that has a «one-to-one» relationship of instructors to the data for each instructor (as in the instructors table) from the data that has a «many-to-one» relationship of the instructor to the data for each instructor (as in the instructor_classes table). But there are times where we will want to see all of this data shown together in a single result set — we may see repetition, but we won’t be storing repetition. We can create these combined result sets using database joins .
LEFT JOIN
all rows from «left» table, and matching rows in right table
A left join includes primary keys from the «left» table (this means the table mentioned in the FROM statement) and will include only those rows in right table that share those same keys.
sqlite3> SELECT * FROM instructors LEFT JOIN instructor_classes on instructors.instructor_id = instructor_classes.instructor_id; instructor_id password first_name last_name instructor_id class_name day ------------- ---------- ---------- ---------- ------------- --------------- ---------- 1 pass1 David Blaikie 1 Advanced Python Monday 1 pass1 David Blaikie 1 Intro Python Thursday 2 pass2 Joe Wilson 2 js Tuesday 2 pass2 Joe Wilson 2 php Monday 3 xxyx Jenny Warner 3 mongodb Thursday 3 xxyx Jenny Warner 3 sql Wednesday 4 yyyy Xavier Yellen
Note the missing data on the right half of the last line. The right table instructor_classes had no data for instructor id 4 .
RIGHT JOIN
all rows from the «right» table, and matching rows in the left table
A right join includes primary keys from the «right» table (this means the table mentioned in the JOIN clause) and will include only those rose in the left table that share the same keys as those in the right.
Unfortunately, SQLite does not support RIGHT JOIN (although many other databases do). The workaround is to use a LEFT JOIN and reverse the table names.
sqlite3> SELECT * FROM instructor_classes LEFT JOIN instructors ON instructors.instructor_id = instructor_classes.instructor_id; instructor_id class_name day instructor_id password first_name last_name ------------- ------------ ---------- ------------- ---------- ---------- ---------- 1 Intro Python Thursday 1 pass1 David Blaikie 1 Advanced Pyt Monday 1 pass1 David Blaikie 2 php Monday 2 pass2 Joe Wilson 2 js Tuesday 2 pass2 Joe Wilson 3 sql Wednesday 3 xxyx Jenny Warner 3 mongodb Thursday 3 xxyx Jenny Warner 99 Golang Saturday
Now only rows that appear in instructor_classes appear in this table, and data not found in instructors is missing (In this case, Golang has no instructor and it is given the default id 99).
INNER JOIN and OUTER JOIN
Select only PKs common to both tables, or all PKs for all tables
INNER JOIN : rows common to both tables
An inner join includes only those rows that have primary key values that are common to both tables:
sqlite3> SELECT * from instructor_classes INNER JOIN instructors ON instructors.instructor_id = instructor_classes.instructor_id; instructor_id class_name day instructor_id password first_name last_name ------------- ------------ ---------- ------------- ---------- ---------- ---------- 1 Intro Python Thursday 1 pass1 David Blaikie 1 Advanced Pyt Monday 1 pass1 David Blaikie 2 php Monday 2 pass2 Joe Wilson 2 js Tuesday 2 pass2 Joe Wilson 3 sql Wednesday 3 xxyx Jenny Warner 3 mongodb Thursday 3 xxyx Jenny Warner
Rows are joined where both instructors and instructor_classes have data. OUTER JOIN : all rows from both tables
An outer join includes all rows from both tables, regardless of whether a PK id appears in the other table. Here’s what the query would be if sqlite3 supported outer joins:
SELECT * from instructor_classes OUTER JOIN instructors ON instructors.instructor_id = instructor_classes.instructor_id;
unfortunately, OUTER JOIN is not currently supported in sqlite3. In these cases it’s probably best to use another approach, i.e. built-in Python or pandas merge() (to come).
Aggregating data with GROUP BY
«Aggregation» means counting, summing or otherwise summarizing multiple values based on a common key.
Consider summing up a count of voters by their political affiliation (2m Democrats, 2m Republicans, .3m Independents), a sum of revenue of companies by their sector (manufacturing, services, etc.), or an average GPA by household income. All of these require taking into account the individual values of multiple rows and compiling some sort of summary value based on those values.
Here is a sample that we’ll play with:
sqlite3> SELECT date, name, rev FROM companyrev; date name rev ---------- ----------- ---------- 2019-01-03 Alpha Corp. 10 2019-01-05 Alpha Corp. 20 2019-01-03 Beta Corp. 5 2019-01-07 Beta Corp. 7 2019-01-09 Beta Corp. 3
If we wish to sum up values by company, we can say it easily:
sqlite3> SELECT name, sum(rev) FROM companyrev GROUP BY name; name sum(rev) ----------- ---------- Alpha Corp. 30 Beta Corp. 15
If we wish to count the number entries for each company, we can say it just as easily:
sqlite3> SELECT name, count(name) FROM companyrev GROUP BY name; name count(name) ----------- ----------- Alpha Corp. 2 Beta Corp. 3
Sorting a Result Set with ORDER BY
This is SQL’s way of sorting results.
The ORDER BY clause indicates a single column, or multiple columns, by which we should order our results:
sqlite3> SELECT name, rev FROM companyrev ORDER BY rev; name rev ---------- ---------- Beta Corp. 3 Beta Corp. 5 Beta Corp. 7 Alpha Corp 10 Alpha Corp 20
Python SQLite — Order By
While fetching data using SELECT query, you will get the records in the same order in which you have inserted them.
You can sort the results in desired order (ascending or descending) using the Order By clause. By default, this clause sorts results in ascending order, if you need to arrange them in descending order you need to use “DESC” explicitly.
Syntax
Following is the syntax of the ORDER BY clause in SQLite.
SELECT column-list FROM table_name [WHERE condition] [ORDER BY column1, column2, .. columnN] [ASC | DESC];
Example
Assume we have created a table with name CRICKETERS using the following query −
sqlite> CREATE TABLE CRICKETERS ( First_Name VARCHAR(255), Last_Name VARCHAR(255), Age int, Place_Of_Birth VARCHAR(255), Country VARCHAR(255) ); sqlite>
And if we have inserted 5 records in to it using INSERT statements as −
sqlite> insert into CRICKETERS values('Shikhar', 'Dhawan', 33, 'Delhi', 'India'); sqlite> insert into CRICKETERS values('Jonathan', 'Trott', 38, 'CapeTown', 'SouthAfrica'); sqlite> insert into CRICKETERS values('Kumara', 'Sangakkara', 41, 'Matale', 'Srilanka'); sqlite> insert into CRICKETERS values('Virat', 'Kohli', 30, 'Delhi', 'India'); sqlite> insert into CRICKETERS values('Rohit', 'Sharma', 32, 'Nagpur', 'India'); sqlite>
Following SELECT statement retrieves the rows of the CRICKETERS table in the ascending order of their age −
sqlite> SELECT * FROM CRICKETERS ORDER BY AGE; First_Name Last_Name Age Place_Of_B Country ---------- ---------- ---- ---------- ------------- Virat Kohli 30 Delhi India Rohit Sharma 32 Nagpur India Shikhar Dhawan 33 Delhi India Jonathan Trott 38 CapeTown SouthAfrica Kumara Sangakkara 41 Matale Srilanka sqlite>
You can use more than one column to sort the records of a table. Following SELECT statements sorts the records of the CRICKETERS table based on the columns AGE and FIRST_NAME.
sqlite> SELECT * FROM CRICKETERS ORDER BY AGE, FIRST_NAME; First_Name Last_Name Age Place_Of_B Country ---------- ---------- ---- ---------- ------------- Virat Kohli 30 Delhi India Rohit Sharma 32 Nagpur India Shikhar Dhawan 33 Delhi India Jonathan Trott 38 CapeTown SouthAfrica Kumara Sangakkara 41 Matale Srilanka sqlite>
By default, the ORDER BY clause sorts the records of a table in ascending order you can arrange the results in descending order using DESC as −
sqlite> SELECT * FROM CRICKETERS ORDER BY AGE DESC; First_Name Last_Name Age Place_Of_B Country ---------- ---------- ---- ---------- ------------- Kumara Sangakkara 41 Matale Srilanka Jonathan Trott 38 CapeTown SouthAfrica Shikhar Dhawan 33 Delhi India Rohit Sharma 32 Nagpur India Virat Kohli 30 Delhi India sqlite>
ORDER BY clause using python
To retrieve contents of a table in specific order, invoke the execute() method on the cursor object and, pass the SELECT statement along with ORDER BY clause, as a parameter to it.
Example
In the following example we are creating a table with name and Employee, populating it, and retrieving its records back in the (ascending) order of their age, using the ORDER BY clause.
import psycopg2 #establishing the connection conn = psycopg2.connect( database="mydb", user='postgres', password='password', host='127.0.0.1', port= '5432' ) #Setting auto commit false conn.autocommit = True #Creating a cursor object using the cursor() method cursor = conn.cursor() #Doping EMPLOYEE table if already exists. cursor.execute("DROP TABLE IF EXISTS EMPLOYEE") #Creating a table sql = '''CREATE TABLE EMPLOYEE( FIRST_NAME CHAR(20) NOT NULL, LAST_NAME CHAR(20), AGE INT, SEX CHAR(1), INCOME INT, CONTACT INT )''' cursor.execute(sql) #Populating the table #Populating the table cursor.execute('''INSERT INTO EMPLOYEE (FIRST_NAME, LAST_NAME, AGE, SEX, INCOME) VALUES ('Ramya', 'Rama priya', 27, 'F', 9000), ('Vinay', 'Battacharya', 20, 'M', 6000), ('Sharukh', 'Sheik', 25, 'M', 8300), ('Sarmista', 'Sharma', 26, 'F', 10000), ('Tripthi', 'Mishra', 24, 'F', 6000)''') conn.commit() #Retrieving specific records using the ORDER BY clause cursor.execute("SELECT * from EMPLOYEE ORDER BY AGE") print(cursor.fetchall()) #Commit your changes in the database conn.commit() #Closing the connection conn.close()
Output
[('Vinay', 'Battacharya', 20, 'M', 6000, None), ('Tripthi', 'Mishra', 24, 'F', 6000, None), ('Sharukh', 'Sheik', 25, 'M', 8300, None), ('Sarmista', 'Sharma', 26, 'F', 10000, None), ('Ramya', 'Rama priya', 27, 'F', 9000, None)]