Basics of SQL
Basics and Background
SQL: Structured Query Language
Relational Database:
- usually in tables
- rows are called records
- columns are certain types of data. Data types of rows are specified:
- INTEGER
- TEXT
- DATE
- REAL, real numbers
- NULL
- …
RDBMS: Relational Database Management System, most RDBMS use SQL as the query language. SQLite is one of the RDBMS.
- SQLite: open source and minimal
- MySQL: powerful and popular, also open source, controlled by Oracle, not really scalable.
- PostgreSQL: open source, even slower than MySQL.
- OracleDB: not open source
- SQL Server: from Microsoft, not open source, and only on windows.
SQL
Statements for Manipulation
Create table
CREATE TABLE table_name ( column_1 data_type, column_2 data_type, column_3 data_type );
Constraints can be included when creating tables. I am using the example from CodeAcademy.
CREATE TABLE awards ( --id should be integer, and is the primary key id INTEGER PRIMARY KEY, --recipient should be text and can not be null recipient TEXT NOT NULL, --default values for a column award_name TEXT DEFAULT "Grammy" );
As mentioned in the basics section, primary keys should be unique to identify the specific row.
Insert new row
INSERT INTO table_name (column_1, column_2, column_3) VALUES (some_value_1, some_value_2, some_value_3);
Update some values
-- Specify the table UPDATE table_name -- choose column to be updated SET column_1 = some_other_value_1 -- specify row location WHERE column_2 = some_specific_value_to_locate_the_row;
Add new columns
-- speficy table ALTER table_name -- add column and specify data type, here I use TEXT ADD COLUMN column_4 TEXT
Delete rows
DELETE FROM celebs -- I use column_4 as an example -- Delete every row if column_4 has NULL values WHERE column_4 IS NULL;
Statements for Queries
Select from table; select returns result set which is a new table.
-- Select out everything from table SELECT * FROM table_name; -- Select out the values of a specific column SELECT column_1 FROM table_name;
Select specific columns
SELECT column_1, column_2 FROM table_name;
AS
keyword: allows you to select the column and return it as the specified new name of the column; the database is NOT modified.SELECT column_1 AS 'A NEW NAME' FROM table_name;
Select and show only the distinct values of the column
SELECT DISTINCT column_1 FROM table_name;
WHERE
key: using operators such as=, !=, >, <, >=, <=
to filter resultsSELECT * FROM table_name WHERE column_1 = 0;
LIKE
key: patern specified likeAA_B
where_
is for a single character.SELECT * FROM table_name WHERE column_1 LIKE `AA_B`
Wildcards:
_
,%
for 0 or more characters.BETWEEN
,AND
,OR
:SELECT * FROM movies WHERE name BETWEEN 'D%' AND 'G%';
SORT BY
: Can be eitherDESC
orASC
and goes after whereSELECT * FROM movies WHERE year > 2014 ORDER BY name DESC;
LIMIT
SELECT * FROM movies ORDER BY imdb_rating DESC LIMIT 3;
CASE
:SELECT name, CASE WHEN genre = 'romance' THEN 'fun' WHEN genre = 'comedy' THEN 'fun' ELSE 'serious' END FROM movies;
## Aggregate
1. `COUNT`:
```SQL
SELECT COUNT(*)
FROM tabe_name;
SUM
:SELECT SUM(column_1) FROM table_name;
MAX
andMIN
:SELECT MAX(column_1) FROM table_name;
AVG
: averageSELECT AVG(column_1) FROM table_name;
ROUND
: round to specified decimals--round the price to integers SELECT name, ROUND(price,0) FROM fake_apps; --round the price to integers --even with other keys as arguments SELECT name, ROUND(AVG(price),0) FROM fake_apps;
GROUP BY
: group by column valuesSELECT price, COUNT(*) FROM fake_apps WHERE downloads > 20000 GROUP BY price;
or
SELECT category, SUM(downloads) FROM fake_apps GROUP BY category;
References can be used in
GROUP BY
SELECT category, SUM(downloads) FROM fake_apps GROUP BY 1; --1 here is identical to category
HAVING
: The problem withWHERE
is that it goes beforeGROUP BY
. What if we need to filter the groups?SELECT price, ROUND(AVG(downloads)) FROM fake_apps GROUP BY price HAVING COUNT(price) > 9;
Multiple Tables
The normalization is explained in DB Normalization.
JOIN
: Join tables with specified columnSELECT * FROM orders JOIN subscriptions ON orders.subscription_id = subscriptions.subscription_id WHERE description = 'Fashion Magazine';
Inner Join: only join the rows that have common values on the specified join columns.
SELECT COUNT(*) FROM newspaper; --Output 60 SELECT COUNT(*) FROM online; --Output 65 SELECT COUNT(*) FROM newspaper JOIN online ON online.id = newspaper.id; --Output 50 <= 60 or 65
Left Join: simply plug all the right table onto left tables, where the values of the specified column match. The number of rows will be the number of rows for the left table.
SELECT * FROM newspaper LEFT JOIN online ON newspaper.id=online.id WHERE online.id IS NULL;
Cross join: combine all the information
SELECT month, COUNT(*) as subscribers FROM months CROSS JOIN newspaper WHERE months.month > newspaper.start_month AND months.month < newspaper.end_month GROUP BY months.month;
UNION
: stack tablesSELECT * FROM newspaper UNION SELECT * FROM online;
WITH
: create a result with alias
wiki/computation/basics-of-sql
:L Ma (2018). 'Basics of SQL', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/computation/basics-of-sql/.