Basics of Database

#Database #Basics

NoSQL

NoSQL = Not only SQL. The four main types of NoSQL databases are

Key-value store: Amazon Dynamo, memcached, Amazon SimpleDB
Column-orient store: Google BigTable, Cassandra
Graph database: Neo4j, VertexDB
Document database: [[MongoDB]] Basics of MongoDB MongoDB is a document based database
Object database: ZODB

Database Operations

Relations

Union: $A\cup B$
Intersection: $A\cap B$
$A - B$
Cartesian Product: $A \times B$

Query

Union in database: will combine the data with matching common columns.

NaturalJoin and EquiJoin:

 --EquiJoin where we specify what condition is used to join
 SELECT * FROM table1 JOIN table2 ON (table1.id = table2.id)
 --NaturalJoin where conditions are chosen by the database automatically
 SELECT * FROM table1 NATURAL JOIN table2

Procedural language: derived from structural programming, focuses on breaking tasks into variables, data structures, and subroutines.

SQL

[[SQL]] Basics of SQL Essential knowledge of programming is one of the core skills of a data scientist.

Nice Article about Index

B-tree: nodes have subnodes; Read Wikipedia for more.
Index architectures:
1. Clustered indices: sort data based on the key column, thus no row locator is needed.
2. Non-clustered indices: store both column and row locator so that sorting is not needed.
“If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row.” from HackerRank.
Leaf vs Non-leaf:
1. Leaf level pages (nodes): the end of the search through B-tree;
2. Non-leaf pages (nodes): root and intermediate
Fill factor: the leaf level pages are not always filled with data. You can specify some pages to be reserved for the future growth of data. The value is from 1 to 100 percent. Server-side default is usually 0 which means that all leaf pages are filled.

Composite index:

 CREATE INDEX index_name
 ON table_name(column1, column2);

OLAP

OLAP Explained: OLAP Cubes explained the operations such as SLICE (slice by one dimension so that we get an N-1 subset of the data), DICE (slice more than two dimensions), DRILL DOWN/UP (navigate among levels of data), ROLL-UP (aggregate, consolidate, involves all the data relations for one or more dimensions), PIVOT (rotation). {: .notes–info}

OLAP: Online analytical processing, with a core of OLAP cube (multidimensional cube, hypercube).
Aggregations: speed up time.
Both star schema and snowflake schema are the source of the cube metadata for OLAP.

About how those operations works, please find the quiz at OLAP Operation Types @ HackerRank

An example is

 --Aggregation number of rows
 n1*n2*n3,
 --Rollup number of rows
 n1*n2*n3+n1+n1*n2+1
 --Cube number of rows
 n1*n2*n3+n1*n2+n2*n3+n1*n3+n1+n2+n3+1

Relational Calculus

Operators

Operators:

∀ = forall : https://en.wikipedia.org/wiki/Turned_A
| = such that : https://en.wikipedia.org/wiki/Set-builder_notation
∧ = logical conjunction (and) : https://en.wikipedia.org/wiki/List_of_logic_symbols
→ = is a function

Stolen from HackerRank: https://www.hackerrank.com/challenges/databases-relational-calculus/forum/comments/325506

Database Key

Primary key: unique values for each row of data; can not contain null values.

Normalization

Dependency

Dependency means a column that determines others.

Partial dependency means that we have some column that depends on only some of the columns but has nothing to do with some other columns. In this case we can see some kind of redundancy. Hence in 2NF we remove partial dependency.

Transitive dependency is that if x determines y, y determines z, then x determines z, which is the transitive dependency.

NF: Normal Form,e.g., 1NF Rules, 2NF Rules, 3NF Rules. Refer to this article. BCNF (Boyce-Codd normal form) please refer to wiki. 4NF is also explained well in wiki.

studytonight

   1NF = can not be broken down into more elementary tables : Single value on each field; no repeating groups (for example table with multiple columns of products which are in principle the same is NOT 1NF)

   2NF = 1NF + no partial dependency: non-key columns depends on primary keys.

   3NF = 2NF + Transitive functional dependency of non-prime attribute on any super key should be removed.

   BCNF = A 3NF table that does not have multiple overlapping candidate keys is guaranteed to be in BCNF

Superkey: set of attributes within a table whose values can be used to uniquely identify a tuple
Candidate key: column(s) to identify unique records
Nonkey Dependency: simply as the name indicates