DB C1 Intro into DB

Course Info

Labs

5 labs and lab reports.

实验 完成要求 报告
数据库 独立 独立
图书管理系统 独立 独立
MINI SQL 可组队 独立(侧重于自己完成的部分)

Examination and Grading Policy

Close book test, but allowing taking 1 A4 page note.

Assignments & Exercises 10
Course quiz 2 ×\times 5
Experiments and Projs 30
Final 50

Assignments

  • DDL: 6.19 23:55
  • course FTP: 10.214.131.93, user: csdbs, password: csdbs
  • Submission: Paper-based + some programming; at the end of class on the due date.

Database System

A database-management system (DBMS) is database and a set of programs to access,update and manage data in DB; a large, complex software system designed to store, manage, and facilitate access to databases.

database:
a collection of interrelated data containing information relevant to an enterprise;
usually valuable, large, integrated and accessed by multiple users and applications;
A large collection of integrated and persistent data.
A collection of information that exists over a long period of time.

Purpose of DBS

Database Applications

Data processing and management are the most important fields of computer applications.

Before Database System: File Process System supported by a conventional OS

  • New application programs must be written if necessary, and new data files are created as required.

  • But over a long period of time, data files may be in different formats.

  • Data files are independent each other.

  • 7 Drawbacks:

    1. Data redundancy and inconsistency: duplication in different files in different formats

    2. Difficulty in accessing data

    3. Data isolation: multiple files and formats, hard to retrieve, share

    4. Integrity problems: adding or changing constraints needs programing

    5. No atomicity of updates

      e.g. Failures may leave database in an inconsistent state with partial updates carried out.

    6. concurrent access control

      e.g.Uncontrolled concurrent access can lead to inconsistency like reading and updating simultaneously.

    7. Security problems

Why DB?

  • Modeling and design of databases: Get/Abstract data models from the real world, and then translate them into the forms suitable for the target DBMS — tables, views.

  • Programming:

  • Use database — queries and update of data.

  • SQL = “intergalactic data-speak”.

Serval access approaches:

  • by using interactive tools provided by DBMS
  • Invoke ODBC/JDBC by using development tools
  • DBMS implementation

Characteristics

  • Efficiency and scalability in data access.
  • Reduced application development time.
  • (physical and logical) Data independence . (见后)
  • Data integrity(一致) and security.
  • Concurrent access(并发) and robustness (i.e. recovery系统崩溃后回滚到正确状态).

DBMS Marketplace

Relational DBMS

Oracle, Sybase, IBM DB2,Microsoft SQL-Server

object-oriented DB

Open source

  • MySQL: the most popular open source database for small system on web sites, a key part of LAMP.
  • PostgreSQL: a highly scalable, open source object-relational database management system
  • OpenGauss
  • PolarDB

View of Data

Levels of Data Abstraction

  • Physical level: describes how a record/tuple is stored.

  • Logical level: describes data stored in database, and the relationships among the data on upper level.

  • View level: application programs hide details of data types. Note that views can also hide information for security purposes.

Schemas and Instances

Schema: the structure of the database on different level.

Analogous to type information of a variable in a program.

  • Physical schema: database structure design at the physical level.
  • Logical schema: database structure design at the logical level.
  • Subschema: schema at view level .

Instance: the actual content of the database at a particular point in time.

Analogous to the value of a variable .

Physical Independence vs. Logical Independence

definition of independence: Ability to modify a schema definition at one level without affecting a schema definition at a higher level.

  • Physical data independence: the ability to modify the physical schema without changing the logical schema.

    NOTE Applications depend on the logical schema and are insulated from how data is structured and stored.

    Physical Independence is One of the most important benefits of using a DBMS!

  • Logical data independenceprotect application programs from changes in logical structure of data.

    Application programs Programs that are used to interact with the database in this fashion.

    NOTE It’s hard to achieve Logical data independence as the application programs are heavily dependent on the logical structure of data.

Data Models

Data model: a collection of conceptual tools for describing data structure, relationships, semantics(语义), constraints.

Different level of data abstraction needs different data model to describe:

  • Entity-Relationship model Relational model
    Conceptual design Logical design

    Other models like Object-oriented model, Semi-structured data models (XML), network model, hierarchical model, etc.

Database Language

Data Definition Language (DDL)

  • Specifies a database scheme as a set of definitions of relational schema.

  • Specifies storage structure, access methods, and consistency constraints.

  • DDL statements are compiled, resulting in a set of tables stored in a special file called data dictionary.

    Data dictionary contains metadata about Database schema, Integrity constraints( Primary Key(主码,可唯一标识实体的编码), Referential integrity) and Authorization.

Data Manipulation Language (DML)

functions:

  • Retrieve data from the database
  • Insert / delete / update data in the database

DML also known as query language.

classes:

  • Procedural DML – user specifies what data is required and how to get those data.
    e.g. C, Pascal, Java, etc.
  • Nonprocedural DML – user specifies what data is required without specifying how to get those data.
    e.g. SQL, Prolog, etc.

Data Control Language (DCL)

SQL (Structured Query Language )

SQL = DDL+ DML+DCL ,the most widely used non-procedural query language. It’s Set-based, declarative, and procedural extensions are offered by different database systems.

usage:

  1. Use SQL directly in the interactive environment.
    e.g. Query Analyzer of SQL Server; Sql Plus and Work Sheet of Oracle;mysql command line client of MySQL, etc.

  2. Use SQL by host language through ODBC / JDBC.

    • The Open Database Connectivity (ODBC) standard defines application program interfaces for use with C and several other languages.

    • The Java Database Connectivity (JDBC) standard defines a corresponding interface for the Java language.

  3. Use SQL by host language with embed-SQL.

Database Design

Steps

Step Contents Explanation notes
1 Requirement analysis data, applications, and operations needed
2 Conceptual database design A high-level description of data, constraints Entity-Relationship (E-R) model or a similar high level data model.
3 Logical database design Converting the conceptual design into a DB schema
4(理论) Schema refinement Normalization of relations Check relational schema for redundancies and related anomalies.
5(设计) Physical database design Indexing, query, clustering, and DB tuning
6 Create and initialize the database Load initial data and test.
7 Security design Identify different user groups and their roles.

Entity-Relationship (E-R) Model of real world

Entities (objects): described by a set of attributes.

Relationships: between entities.

Database design in E-R model is usually converted to design in the relational model .

E-R diagram:

Relational Model

Transfer E-R diagrams into relational schema.

Database Users and Administrators

Database Users

Users are differentiated by the way they expect to interact with the system.

  • Naive users: invoke one of the permanent application programs that have been written previously by a high level language.

    e.g. people accessing database over the web, bank tellers, clerical staff.

  • Professional users:

    • Application programmers : interact with system via SQL calls.
    • Sophisticated users : form requests in a database query language.
      e.g. Online Analytical Processing (OLAP), Data mining.
    • Specialized users: write specialized database applications that do not fit into the traditional data processing framework.
      e.g. CAD, Expert System (ES), KDB.

Database Administrators (DBA)

A special user having central control over database and programs accessing those data(DB).

characteristics:

  • the highest privilege for the database.
  • coordinates all the activities of the database system.
  • controls all users authority to the database.
  • good understanding of the enterprise’s information resources and requirements.

duties:

  • Definition of storage structure, access method and schema
  • Modification of schema and physical organization
  • Granting of authorization for data access
  • Routing maintenance
    • Monitoring performance and responding to changes in requirements
    • Security for the database (e.g. periodically backup database, recovery when failure)

Transaction Management

Concurrent use/access is important, but causes problems/conflict.

A transaction: a collection of operations that performs a single logical function in a database application.

  • requirements(ACID): atomicity, consistence, isolation, durability.
  • Transaction-management: The component ensures that the database remains in a consistent (or correct) state, although system failures and transaction failures.
    e.g. power failures and operating system crashes
  • Concurrency-control manager : controls the interaction among the concurrent transactions.

Database Architecture

Storage Manager

A program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.

tasks:

  • Interaction with the file manager
  • Efficient storing, retrieving and updating of data

contents:

  • Transaction manager

  • Authorization and integrity manger

  • File manager (interaction with the file system to process data files, data dictionary, and index files)

  • Buffer manager

Query Processor

Query Processor includes DDL interpreter, DML compiler, and query processing, accomplishing tasks of Parsing and translation, Optimization and Evaluation.

What does Optimizer do?

There are alternative ways of evaluating a given query: Equivalent expressions but different algorithms for each operation. Cost difference between a good and a bad way of evaluating a query can be enormous.

Estimate the cost of operations and take the execution plan is important.

  • Depends critically on statistical information about relations which the database must maintain
  • Need to estimate statistics for intermediate results to compute cost of complex expressions

Overall System Structure

Application Architecture

The architecture of a database system is greatly influenced by the underlying computer system on which the database is running.

  • Centralized
  • Client-server
  • Parallel (multi-processor)
  • Distributed

History of Database Systems

1950s and early 1960s

Data processing using magnetic tapes(only sequential access) for storage, punched cards for input.

Late 1960s and 1970s

Hard disks allowed direct access to data.

Network and hierarchical data models in widespread use.

Ted Codd defines the relational data model

High-performance (for the era) transaction processing

1980s

Research relational prototypes evolve into commercial systems, SQL becomes industrial standard.

Parallel and distributed database systems.

Object-oriented database systems .

1990s

Large decision support and data-mining applications.

Large multi-terabyte data warehouses.

Emergence of Web commerce

Early 2000s:

XML and XQuery standards

Automated database administration

Later 2000s

Giant data storage systems


DB C1 Intro into DB
http://example.com/2023/02/27/DB-1/
Author
Tekhne Chen
Posted on
February 27, 2023
Licensed under