DB C2 Relational Model
A relational database consists of a collection of tables, each of which is assigned a unique name.
Relational Model
- The relational model is very simple and elegant.
- A relational database: a collection of one or more relations based on the relational model.
Major advantages:
- straightforward data representation
- the ease with which complex queries can be expressed.
SQL is the most widely used language for creating, manipulating, and querying relational database.
c.f
relationship and relation
A relation is a table with rows and columns; the mathematical concept, referred to as a table.
A relationship is an association among several entities.
Entity set and relationship set Relation real world table, tuple, row machine world
Structure
Basic Structure
Formally given sets , a $ r$ is a subset of / Cartesian product .
A relation is a set of n-tuples (), where each .
Attribute Types
Each attribute of a relation has a name.
domain of the attribute: The set of allowed values for each attribute.
Attribute values are (normally) required to be atomic i.e
indivisible.
e.g
multivalued/composite attribute values are not atomic.
The special value null
null
is a member of every domain.
null
value causes complications in the definition of many operations.
Relation
relation schema
Describes the structure of the relation. Assume are attributes, formally express a relation schema
is a relation on the relation schema R.
relation instance
Corresponds to the snapshot of the data in the relation at a given instant in time. The current values / relation instance of a relation are specified by a table.
An element t
of r
is a tuple
, represented by a row
in a table
. then t[name] denotes the value of t
on the name
attribute.
variable | relation |
---|---|
variable type | relation schema |
variable value | relation instance |
Properties
- The order of tuples is irrelevant.
- No duplicated tuples in a relation.
- Attribute values are atomic.
Key
Key is a subset of a relation, .
superkey (超码)
superkey: a set of one or more attributes that the values for K are sufficient to identify a unique tuple of each possible relation r(R) .
Superkey is not distinct.
e.g
both {ID} and {ID, name} are superkeys.
candidate key (候选码)
candidate key: The minimal superkey.
primary key (主码)
primary key: a candidate key chosen by the database designer as the principal means of identifying tuples within a relation.
Primary key is usually marked by underline.
A key is a property of the entire relation, rather than
of the individual tuples.Any two individual tuples in the relation are prohibited from having the same value on the key attributes at the same time. The designation of a key represents a constraint in the real-world enterprise being modeled. Thus,
primary keys
are also referred to as primary key constraints.
Foreign Key (外码)
A foreign-key constraint from attribute(s) of relation to the primary key
of relation states that on any database instance, the value of for each tuple in must also be the value of for some tuple in .
Attribute set is called a foreign key from , the referencing relation of the foreign-key constraint
, referencing , the referenced relation.
- Primary key and foreign key are integrated constraints.
Schema Diagrams
A database schema
, along with primary key and foreign-key constraints
, can be depicted by schema diagrams.
Relational Query Languages
query language: a language in which a user requests information from the database. Usually on a level higher than that of a standard programming language.
Query languages can be categorized as imperative, functional, or declarative.
imperative query language
-
The user instructs the system to perform a specific sequence of operations on the database to compute the desired result.
-
Usually have a notion of state variables.
functional query language
- The computation is expressed as the evaluation of functions that may operate on data in the database or on the results of other functions.
- Functions are side-effect free.
- Functions do not update the program state.
declarative query language
-
User describes the desired information without giving a specific sequence of steps or function calls .
-
The desired information is typically described using some form of mathematical logic.
It is the database system’s job to figure out how to obtain the desired information.
Pure languages
-
Relational Algebra — the basis of SQL
-
Tuple Relational Calculus (元组关系演算)
-
Domain Relational Calculus — (域关系演算) QBE
Query languages used in practice, such as the SQL query language, include elements of the imperative, functional, and declarative approaches.
Fundamental Relational-Algebra Operations
The operators take one or two relations as inputs, and return a new relation as a result.
Select
selection predicate
is a formula in propositional calculus consisting of terms
like
connected by \and,\or,\not, where op
is relational operator in .
The selection conditions need to aim at the attribute values of the same tuple, when we conduct section operation.
Project
is attribute name and is a relation name. The result is defined as the relation of k columns obtained by erasing the columns that are not listed.
NOTE
Duplicate rows removed from result, since relations are sets.
Union
NOTE
Must be taken between compatible relations:
r and s must have the same arity .
The attribute domains must be compatible.
Set difference
NOTE
Must be taken between compatible relations.
Cartesian product
NOTE
We always assume that attributes of and are disjoint, i.e.
.
Under not disjoint circumstances, then renaming for attributes must be used.
Rename
returns under the name . And
returns under the name , with all attributes renamed . Renaming allows
- to name, and to refer to the results of relational-algebra expressions procedurally.
- to refer to a relation by more than one name.
Banking Example
Relations
relation | schema |
---|---|
branch | (branchName, branchCity, assets) |
customer | (customerName, customer-street, customerCity) |
account | (accountNumber, branchName, balance) |
loan | (loanNumber, branchName, amount) |
depositor | (customerName, accountNumber) |
borrower | (customerName, loanNumber) |
Find all
loans
of over $1200.Find the
loan number
for each loan of an amount greater than $1200.Find the
names of all customers
who have a loan, or an account, or both, from the bank.Find the
names of all customers
who at least have a loan and an account at bank.Find the
names of all customers
who have a loan at the Perryridge branch.Find the
names of all customers
who have loans at the Perryridge branch but do not have an account at any branch of the bank.Find the largest
account balance
.(Self-comparison)
Additional Relational-Algebra Operations
Additional Operations do not add any power to the relational algebra, but simplify common queries.
Set intersection
NOTE
- Must be taken between compatible relations.
Natural join
r \Join s = = \Pi_{r.A_{n_1}, r.A_{n_2},\cdots,r.A_{n_k},s.A_{m_1}, \cdots,s.A_{m_j},}(\sigma_{r.A_{l_1} = s.A_{r_1}\and \cdots r.A_{l_p} = s.A_{r_p}} (r x s))
Let r and s be relations on schemas R and S respectively. Then
is a relation on schema obtained as follows:
- Consider each pair of tuples from r and from s.
- If and have the same value on each of the attributes in , add a tuple t to the result s.t. t has the same value as on r and on s.
Theta Join
Division
Let r and s be relations on schemas R and S respectively, where and . Then is a relation on the schema .
Note
- encloses the result of
- The union of the tuple(s) t of and all the tuples in s is covered by r ,
i.e
is the largest relation satisfying .
Assignment
The assignment operation provides a convenient way to express complex queries.
Note
Assignment must always be made to a temporary relation variable.
Extended Relational-Algebra Operations
Generalized Projection
Generalized Projection extends the projection operation by allowing arithmetic functions to be used in the projection list.
- : relational-algebra expression
- : arithmetic expressions involving constants and attributes in the schema of .
Aggregate Functions
Aggregation function takes a collection of values and returns a single value as a result.
- avg: average value
- min/max: minimum/maximum value
- sum: sum of values
- count: number of values
- : relational-algebra expression
- : a list of attributes on which to group (can be empty)
- : an aggregate function to apply on grouped by .
- : an attribute name.
Note
Result of aggregation does not have a name.
- rename operation;
- permit renaming as part of aggregate operation For convenience.
Modification of the Database
ref
P108-114
The content of the database may be modified using Deletion, Insertion and Updating. All these operations are expressed using the assignment operator.
Deletion
A delete request is expressed in the same way as a query, except the selected tuples are removed from the database instead of displaying tuples.
-
Only whole tuples rather values on only particular attributes can be deleted.
-
by relational-algebra,
- by SQL,
Insertion
To insert data into a relation, we either Specify a tuple to be inserted or Write a query whose result is a set of tuples to be inserted.
-
by relational-algebra,
-
by SQL,
Updating
A update is to change a value in a tuple without changing all values in the tuple.
-
by relational-algebra, using the generalized projection operator,
:
- attribute of r, if the attribute is not updated
- an expression involving only constants and the attributes of r, to give the attributes new values, if the attribute is to be updated.
-
by SQL,
Note
注意update的顺序,不当的顺序可能会造成重复更新以致错误。