DB C2 Relational Model

A relational database consists of a collection of tables, each of which is assigned a unique name.

Relational Model

  • The relational model is very simple and elegant.
  • A relational database: a collection of one or more relations based on the relational model.

Major advantages:

  • straightforward data representation
  • the ease with which complex queries can be expressed.

SQL is the most widely used language for creating, manipulating, and querying relational database.

c.f relationship and relation

  • A relation is a table with rows and columns; the mathematical concept, referred to as a table.

  • A relationship is an association among several entities.

Entity set and relationship set Relation
real world table, tuple, row
machine world

Structure

Basic Structure

Formally given sets D1,D2,,Dn(Di=aijj=1k)D_1, D_2, …, D_n (D_i = a_{ij} | j=1…k), a relationrelation $ r$ is a subset of D1×D2××DnD_1 \times D_2 \times … \times D_n / Cartesian product DiD_i .

A relation is a set of n-tuples (a1j,a2j,,anja_{1j}, a_{2j}, …, a_{nj}), where each aijDi,i=1,2,,na_{ij} \in D_i , i = 1,2,\cdots,n.

Attribute Types

Each attribute of a relation has a name.

domain of the attribute: The set of allowed values for each attribute.

Attribute values are (normally) required to be atomic i.e indivisible.

e.g multivalued/composite attribute values are not atomic.

The special value null

null is a member of every domain.
null value causes complications in the definition of many operations.

Relation

relation schema

Describes the structure of the relation. Assume A1,A2,,AnA_1,A_2,\cdots,A_n​ are attributes, formally express a relation schema

R=(A1,A2,,An)R = (A_1,A_2,\cdots,A_n)

r(R)r(R) is a relation on the relation schema R.

relation instance

Corresponds to the snapshot of the data in the relation at a given instant in time. The current values / relation instance of a relation are specified by a table.

An element t of r is a tuple, represented by a row in a table. then t[name] denotes the value of t on the name attribute.

variablerelation
variable typerelation schema
variable valuerelation instance

Properties

  • The order of tuples is irrelevant.
  • No duplicated tuples in a relation.
  • Attribute values are atomic.

Key

Key is a subset of a relation, KRK \sube R.

superkey (超码)

superkey: a set of one or more attributes that the values for K are sufficient to identify a unique tuple of each possible relation r(R) .

Superkey is not distinct.

e.g both {ID} and {ID, name} are superkeys.

candidate key (候选码)

candidate key: The minimal superkey.

primary key (主码)

primary key: a candidate key chosen by the database designer as the principal means of identifying tuples within a relation.

  • Primary key is usually marked by underline.

  • A key is a property of the entire relation, rather than
    of the individual tuples.

  • Any two individual tuples in the relation are prohibited from having the same value on the key attributes at the same time. The designation of a key represents a constraint in the real-world enterprise being modeled. Thus, primary keys are also referred to as primary key constraints.

Foreign Key (外码)

A foreign-key constraint from attribute(s) AA of relation r1r_1 to the primary key BB of relation r2r_2 states that on any database instance, the value of AA for each tuple in r1r_1 must also be the value of BB for some tuple in r2r_2.

Attribute set AA is called a foreign key from r1r_1, the referencing relation of the foreign-key constraint, referencing r2r_2, the referenced relation.

  • Primary key and foreign key are integrated constraints.

Schema Diagrams

A database schema, along with primary key and foreign-key constraints, can be depicted by schema diagrams.

Relational Query Languages

query language: a language in which a user requests information from the database. Usually on a level higher than that of a standard programming language.

Query languages can be categorized as imperative, functional, or declarative.

imperative query language

  • The user instructs the system to perform a specific sequence of operations on the database to compute the desired result.

  • Usually have a notion of state variables.

functional query language

  • The computation is expressed as the evaluation of functions that may operate on data in the database or on the results of other functions.
  • Functions are side-effect free.
  • Functions do not update the program state.

declarative query language

  • User describes the desired information without giving a specific sequence of steps or function calls .

  • The desired information is typically described using some form of mathematical logic.

    It is the database system’s job to figure out how to obtain the desired information.

Pure languages

  • Relational Algebra — the basis of SQL

  • Tuple Relational Calculus (元组关系演算)

  • Domain Relational Calculus — (域关系演算) QBE

Query languages used in practice, such as the SQL query language, include elements of the imperative, functional, and declarative approaches.

Fundamental Relational-Algebra Operations

The operators take one or two relations as inputs, and return a new relation as a result.

Select

σρ(r)={ttrρ(t)}\sigma_\rho(r) = \{t | t \in r \wedge \rho(t)\}

selection predicateρ\rho is a formula in propositional calculus consisting of termslike

<attribute>op<attribute>or<constant><attribute> op <attribute> or <constant>

connected by \and,\or,\not, where op is relational operator in =,<,>,,,=,<,>,\leq,\geq,\neq.

The selection conditions need to aim at the attribute values of the same tuple, when we conduct section operation.

Project

ΠA1,A2,,Ak(r)\Pi_{A_1,A_2,\cdots,A_k}(r)

AiA_i is attribute name and rr is a relation name. The result is defined as the relation of k columns obtained by erasing the columns that are not listed.

NOTEDuplicate rows removed from result, since relations are sets.

Union

rs={ttrts}r\cup s = \{t|t \in r \vee t \in s\}

NOTEMust be taken between compatible relations:

  • r and s must have the same arity .

  • The attribute domains must be compatible.

Set difference

rs={ttrts}r- s = \{t|t \in r \wedge t \notin s\}

NOTEMust be taken between compatible relations.

Cartesian product

r×s={{tq}trqs}r\times s = \{\left\{tq\right\}|t \in r\wedge q \in s\}

NOTEWe always assume that attributes of r(R)r(R) and s(S)s(S) are disjoint, i.e. RS=ϕR\cap S = \phi.

Under not disjoint circumstances, then renaming for attributes must be used.

Rename

ρX(E)\rho_X(E)

returns EE under the name XX. And

ρX(A1,A2,,An)(E)\rho_{X(A_1,A_2,\cdots,A_n)}(E)

returns EE under the name XX, with all attributes renamed A1,A2,,AnA_1,A_2,\cdots,A_n. Renaming allows

  • to name, and to refer to the results of relational-algebra expressions procedurally.
  • to refer to a relation by more than one name.

Banking Example

Relations

relationschema
branch(branchName, branchCity, assets)
customer(customerName, customer-street, customerCity)
account(accountNumber, branchName, balance)
loan(loanNumber, branchName, amount)
depositor(customerName, accountNumber)
borrower(customerName, loanNumber)
  1. Find all loans of over $1200.

    σamount>1200(loan)\sigma_{amount > 1200}(loan)

  2. Find the loan number for each loan of an amount greater than $1200.

    ΠloanNumber(σamount>1200(loan))\Pi_{loanNumber}(\sigma_{amount>1200}(loan))

  3. Find the names of all customers who have a loan, or an account, or both, from the bank.

    ΠcustomerName(borrower)ΠcustomerName(depositor)\Pi_{customerName}(borrower)\cup\Pi_{customerName}(depositor)

  4. Find the names of all customers who at least have a loan and an account at bank.

    ΠcustomerName(borrower)ΠcustomerName(depositor)\Pi_{customerName}(borrower)\cap\Pi_{customerName}(depositor)

  5. Find the names of all customers who have a loan at the Perryridge branch.

    ΠcustomerName(σborrower.loanNumber=loan.loanNumber(borrower×(σbranchName=Perryridge(loan))))\Pi_{customerName}(\sigma_{borrower.loanNumber = loan.loanNumber}(borrower\times (\sigma_{branchName = 'Perryridge'}(loan))))

  6. Find the names of all customers who have loans at the Perryridge branch but do not have an account at any branch of the bank.

    ΠcustomerName(σborrower.loanNumber=loan.loanNumber(borrower×(σbranchName=Perryridge(loan))))ΠcustomerName(depositor)\Pi_{customerName}(\sigma_{borrower.loanNumber = loan.loanNumber}(borrower\times (\sigma_{branchName = 'Perryridge'}(loan)))) - \Pi_{customerName}(depositor)

  7. Find the largest account balance.(Self-comparison)

    Πbalance(account)Πaccount.balance(σaccount.balance<d.balance(account×ρd(account)))\Pi_{balance}(account) – \Pi_{account.balance}(\sigma_{account.balance<d.balance}(account \times \rho_d(account)))

Additional Relational-Algebra Operations

Additional Operations do not add any power to the relational algebra, but simplify common queries.

Set intersection

rs={ttrandts}r \cap s = \{t | t \in r and t \in s\}

NOTE

  1. Must be taken between compatible relations.
  2. rs=r(rs)r \cap s = r - (r - s)

Natural join

r \Join s = = \Pi_{r.A_{n_1}, r.A_{n_2},\cdots,r.A_{n_k},s.A_{m_1}, \cdots,s.A_{m_j},}(\sigma_{r.A_{l_1} = s.A_{r_1}\and \cdots r.A_{l_p} = s.A_{r_p}} (r x s))

Let r and s be relations on schemas R and S respectively. Then
rsr\Join s is a relation on schema RSR\cup S obtained as follows:

  • Consider each pair of tuples trt_r from r and tst_s from s.
  • If trt_r and tst_s have the same value on each of the attributes in RSR \cap S, add a tuple t to the result s.t. t has the same value as trt_r on r and tst_s on s.

Theta Join

rθs=σθ(r×s)r \Join_{\theta} s = \sigma_\theta(r\times s)

Division

r÷sr \div s

Let r and s be relations on schemas R and S respectively, where R=(A1,,Am,B1,,Bn)R = (A_1,\cdots,A_m,B_1,\cdots,B_n) and S=(B1,,Bn)S = (B_1,\cdots,B_n). Then r÷sr \div s is a relation on the schema RS=(A1,,Am)R - S = (A_1,\cdots,A_m).

r÷s={ttΠRS(r)us,tur}=ΠRS(r)ΠRS((ΠRS(r)×s)ΠRS,S(r))\begin{align} r \div s &= \left\{t|t \in \Pi_{R-S}(r) \wedge \forall u\in s,tu \in r \right\} \\ &= \Pi_{R-S}(r) – \Pi_{R-S}((\Pi_{R-S}(r)\times s) – \Pi_{R-S,S}(r)) \end{align}

Note

  1. ΠRS(r)\Pi_{R-S}(r) encloses the result of r÷sr\div s
  2. The union of the tuple(s) t of r÷sr \div s and all the tuples in s is covered by r ,i.e q=r÷sq = r \div s is the largest relation satisfying q×srq\times s\sube r.

Assignment

The assignment operation \leftarrow provides a convenient way to express complex queries.

Note Assignment must always be made to a temporary relation variable.

Extended Relational-Algebra Operations

Generalized Projection

Generalized Projection extends the projection operation by allowing arithmetic functions to be used in the projection list.

ΠF1,F2,,Fn(E)\Pi_{F_1,F_2,\cdots,F_n}(E)

  1. EE : relational-algebra expression
  2. FiF_i: arithmetic expressions involving constants and attributes in the schema of EE.

Aggregate Functions

Aggregation function takes a collection of values and returns a single value as a result.

  • avg: average value
  • min/max: minimum/maximum value
  • sum: sum of values
  • count: number of values

G1,G2,,GngF1(A1),F2(A2),,Fn(An)(E)_{G_1, G_2,\cdots, G_n}\large g_{F_1(A_1), F_2(A_2),\cdots , F_n(A_n)}(E)

  1. EE: relational-algebra expression
  2. G1,G2,,GnG_1, G_2,\cdots, G_n: a list of attributes on which to group (can be empty)
  3. FiF_i: an aggregate function to apply on AiA_i grouped by GiG_i.
  4. AiA_i: an attribute name.

Note Result of aggregation does not have a name.

  • rename operation;
  • permit renaming as part of aggregate operation For convenience.

Modification of the Database

ref P108-114

The content of the database may be modified using Deletion, Insertion and Updating. All these operations are expressed using the assignment operator.

Deletion

A delete request is expressed in the same way as a query, except the selected tuples are removed from the database instead of displaying tuples.

  • Only whole tuples rather values on only particular attributes can be deleted.

  • by relational-algebra,

rrEr \leftarrow r - E

  • by SQL,

    delete from rwhere P;\begin{align*} &\rm delete\ from\ \it r\\&\rm where\ \it P; \end{align*}

Insertion

To insert data into a relation, we either Specify a tuple to be inserted or Write a query whose result is a set of tuples to be inserted.

  • by relational-algebra,

    rrEr \leftarrow r \cup E

  • by SQL,

    insert into rvalues(CS437,DatabaseSystems,Comp.Sci.,4);\begin{align*} \rm insert\ &\rm into\ \it r \\ &\rm values ('CS-437', 'Database Systems', 'Comp. Sci.', 4); \end{align*}

Updating

A update is to change a value in a tuple without changing all values in the tuple.

  • by relational-algebra, using the generalized projection operator,

    rΠF1,,Fi(r)r \leftarrow \Pi_{F_1,\cdots,F_i}(r)

    FiF_i:

    1. ithi^{th} attribute of r, if the ithi^{th} attribute is not updated
    2. an expression involving only constants and the attributes of r, to give the attributes new values, if the attribute is to be updated.
  • by SQL,

    update rset salary=salary1.05;\begin{align*} &\rm update\ \it r \\ &\rm set\ \it salary= salary * 1.05; \end{align*}

Note

注意update的顺序,不当的顺序可能会造成重复更新以致错误。


DB C2 Relational Model
http://example.com/2023/03/06/DB-2/
Author
Tekhne Chen
Posted on
March 6, 2023
Licensed under