Basics of SQL

Adding a new field to data:

  1. Relational: requires a new column
  2. Non-Relational: just add the field to one single document, thus can be easily decentralized.

Basics and Background

  1. SQL: Structured Query Language

  2. Relational Database:

    1. usually in tables
    2. rows are called records
    3. columns are certain types of data. Data types of rows are specified:
      1. INTEGER
      2. TEXT
      3. DATE
      4. REAL, real numbers
      5. NULL
  3. RDBMS: Relational Database Management System, most RDBMS use SQL as the query language. SQLite is one of the RDBMS.

    1. SQLite: open source and minimal
    2. MySQL: powerful and popular, also open source, controlled by Oracle, not really scalable.
    3. PostgreSQL: open source, even slower than MySQL.
    4. OracleDB: not open source
    5. SQL Server: from Microsoft, not open source, and only on windows.

SQL

Semicolon in SQL is a statement terminator. Just use it.

Capitalization Strategy

Capitalization Strategy

For readablilty

  1. Capitalize clauses
  2. Capitalize table names etc

On stackoverflow: stackoverflow

Statements for Manipulation

  1. Create table

    
       CREATE TABLE table_name (
           column_1 data_type,
           column_2 data_type,
           column_3 data_type
         );
    

    Constraints can be included when creating tables. I am using the example from CodeAcademy.

    
       CREATE TABLE awards (
       --id should be integer, and is the primary key
          id INTEGER PRIMARY KEY,
       --recipient should be text and can not be null
          recipient TEXT NOT NULL,
       --default values for a column
          award_name TEXT DEFAULT "Grammy"
       );
    

    As mentioned in the basics section, primary keys should be unique to identify the specific row.

  2. Insert new row

       INSERT INTO table_name (column_1, column_2, column_3)
       VALUES (some_value_1, some_value_2, some_value_3);
    
  3. Update some values

       -- Specify the table
       UPDATE table_name
       -- choose column to be updated
       SET column_1 = some_other_value_1
       -- specify row location
       WHERE column_2 = some_specific_value_to_locate_the_row;
    
  4. Add new columns

       -- speficy table
       ALTER table_name
       -- add column and specify data type, here I use TEXT
       ADD COLUMN column_4 TEXT
    
  5. Delete rows

       DELETE FROM celebs
       -- I use column_4 as an example
       -- Delete every row if column_4 has NULL values
       WHERE column_4 IS NULL;
    

Statements for Queries

  1. Select from table; select returns result set which is a new table.

       -- Select out everything from table
       SELECT * FROM table_name;
       -- Select out the values of a specific column
       SELECT column_1 FROM table_name;
    
  2. Select specific columns

       SELECT column_1, column_2
       FROM table_name;
    
  3. AS keyword: allows you to select the column and return it as the specified new name of the column; the database is NOT modified.

       SELECT column_1 AS 'A NEW NAME'
       FROM table_name;
    
  4. Select and show only the distinct values of the column

       SELECT DISTINCT column_1
       FROM table_name;
    
  5. WHERE key: using operators such as =, !=, >, <, >=, <= to filter results

       SELECT * FROM table_name
       WHERE column_1 = 0;
    
  6. LIKE key: patern specified like AA_B where _ is for a single character.

       SELECT * FROM table_name
       WHERE column_1 LIKE `AA_B`
    

    Wildcards: _, % for 0 or more characters.

  7. BETWEEN, AND, OR:

       SELECT *
       FROM movies
       WHERE name BETWEEN 'D%' AND 'G%';
    
  8. SORT BY: Can be either DESC or ASC and goes after where

       SELECT * FROM movies
       WHERE year > 2014
       ORDER BY name DESC;
    
  9. LIMIT

       SELECT *
       FROM movies
       ORDER BY imdb_rating DESC
       LIMIT 3;
    
  10. CASE:

       SELECT name,
        CASE
          WHEN genre = 'romance' THEN 'fun'
          WHEN genre = 'comedy' THEN 'fun'
          ELSE 'serious'
        END
       FROM movies;
    



## Aggregate



1. `COUNT`:

```SQL
   SELECT COUNT(*)
   FROM tabe_name;
  1. SUM:

       SELECT SUM(column_1)
       FROM table_name;
    
  2. MAX and MIN:

       SELECT MAX(column_1)
       FROM table_name;
    
  3. AVG: average

       SELECT AVG(column_1)
       FROM table_name;
    
  4. ROUND: round to specified decimals

       --round the price to integers
       SELECT name, ROUND(price,0)
       FROM fake_apps;
    
       --round the price to integers
       --even with other keys as arguments
       SELECT name, ROUND(AVG(price),0)
       FROM fake_apps;
    
  5. GROUP BY: group by column values

       SELECT price, COUNT(*)
       FROM fake_apps
       WHERE downloads > 20000
       GROUP BY price;
    

    or

       SELECT category, SUM(downloads)
       FROM fake_apps
       GROUP BY category;
    

    References can be used in GROUP BY

       SELECT category, SUM(downloads)
       FROM fake_apps
       GROUP BY 1;
       --1 here is identical to category
    
  6. HAVING: The problem with WHERE is that it goes before GROUP BY. What if we need to filter the groups?

       SELECT price, ROUND(AVG(downloads))
       FROM fake_apps
       GROUP BY price
       HAVING COUNT(price) > 9;
    

Multiple Tables

The normalization is explained in DB Normalization.

  1. JOIN: Join tables with specified column

       SELECT *
       FROM orders
       JOIN subscriptions
       ON orders.subscription_id = subscriptions.subscription_id
       WHERE description = 'Fashion Magazine';
    
  2. Inner Join: only join the rows that have common values on the specified join columns.

       SELECT COUNT(*)
       FROM newspaper;
       --Output 60
    
       SELECT COUNT(*)
       FROM online;
       --Output 65
    
       SELECT COUNT(*)
       FROM newspaper
       JOIN online
       ON online.id = newspaper.id;
       --Output 50 <= 60 or 65
    
  3. Left Join: simply plug all the right table onto left tables, where the values of the specified column match. The number of rows will be the number of rows for the left table.

       SELECT *
       FROM newspaper
       LEFT JOIN online
       ON newspaper.id=online.id
       WHERE online.id IS NULL;
    
  4. Cross join: combine all the information

       SELECT month,
       COUNT(*) as subscribers
       FROM months
       CROSS JOIN newspaper
       WHERE months.month > newspaper.start_month AND months.month < newspaper.end_month
       GROUP BY months.month;
    
  5. UNION: stack tables

       SELECT *
       FROM newspaper
       UNION
       SELECT *
       FROM online;
    
  6. WITH: create a result with alias

Planted: by ;

L Ma (2018). 'Basics of SQL', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/computation/basics-of-sql/.