DB C1 Intro into DB
Course Info
Labs
5 labs and lab reports.
实验 | 完成要求 | 报告 |
---|---|---|
数据库 | 独立 | 独立 |
图书管理系统 | 独立 | 独立 |
MINI SQL | 可组队 | 独立(侧重于自己完成的部分) |
Examination and Grading Policy
Close book test, but allowing taking 1 A4 page note.
Assignments & Exercises | 10 |
---|---|
Course quiz | 2 5 |
Experiments and Projs | 30 |
Final | 50 |
Assignments
- DDL: 6.19 23:55
- course FTP: 10.214.131.93, user:
csdbs
, password:csdbs
- Submission: Paper-based + some programming; at the end of class on the due date.
Database System
A database-management system (DBMS) is database and a set of programs to access,update and manage data in DB; a large, complex software system designed to store, manage, and facilitate access to databases.
database:
a collection of interrelated data containing information relevant to an enterprise;
usually valuable, large, integrated and accessed by multiple users and applications;
A large collection of integrated and persistent data.
A collection of information that exists over a long period of time.
Purpose of DBS
Database Applications
Data processing and management are the most important fields of computer applications.
Before Database System:
File Process System supported by a conventional OS
New application programs must be written if necessary, and new data files are created as required.
But over a long period of time, data files may be in different formats.
Data files are independent each other.
7 Drawbacks:
Data redundancy and inconsistency: duplication in different files in different formats
Difficulty in accessing data
Data isolation: multiple files and formats, hard to retrieve, share
Integrity problems: adding or changing constraints needs programing
No atomicity of updates
e.g.
Failures may leave database in an inconsistent state with partial updates carried out.concurrent access control
e.g.
Uncontrolled concurrent access can lead to inconsistency like reading and updating simultaneously.Security problems
Why DB?
-
Modeling and design of databases: Get/Abstract data models from the real world, and then translate them into the forms suitable for the target DBMS — tables, views.
-
Programming:
-
Use database — queries and update of data.
-
SQL = “intergalactic data-speak”.
Serval access approaches:
- by using interactive tools provided by DBMS
- Invoke ODBC/JDBC by using development tools
- DBMS implementation
Characteristics
- Efficiency and scalability in data access.
- Reduced application development time.
- (physical and logical) Data independence . (见后)
- Data integrity(一致) and security.
- Concurrent access(并发) and robustness (i.e. recovery系统崩溃后回滚到正确状态).
DBMS Marketplace
Relational DBMS
Oracle, Sybase, IBM DB2,Microsoft SQL-Server
object-oriented DB
Open source
- MySQL: the most popular open source database for small system on web sites, a key part of LAMP.
- PostgreSQL: a highly scalable, open source object-relational database management system
- OpenGauss
- PolarDB
View of Data
Levels of Data Abstraction
-
Physical level: describes how a record/tuple is stored.
-
Logical level: describes data stored in database, and the relationships among the data on upper level.
-
View level: application programs hide details of data types. Note that views can also hide information for security purposes.
Schemas and Instances
Schema: the structure of the database on different level.
Analogous to type information of a variable in a program.
- Physical schema: database structure design at the physical level.
- Logical schema: database structure design at the logical level.
- Subschema: schema at view level .
Instance: the actual content of the database at a particular point in time.
Analogous to the value of a variable .
Physical Independence vs. Logical Independence
definition of independence: Ability to modify a schema definition at one level without affecting a schema definition at a higher level.
-
Physical data independence: the ability to modify the physical schema without changing the logical schema.
NOTE
Applications depend on the logical schema and are insulated from how data is structured and stored.Physical Independence is One of the most important benefits of using a DBMS!
-
Logical data independence: protect application programs from changes in logical structure of data.
Application programs
Programs that are used to interact with the database in this fashion.NOTE
It’s hard to achieve Logical data independence as the application programs are heavily dependent on the logical structure of data.
Data Models
Data model: a collection of conceptual tools for describing data structure, relationships, semantics(语义), constraints.
Different level of data abstraction needs different data model to describe:
-
Entity-Relationship model Relational model Conceptual design Logical design Other models like Object-oriented model, Semi-structured data models (XML), network model, hierarchical model, etc.
Database Language
Data Definition Language (DDL)
-
Specifies a database scheme as a set of definitions of relational schema.
-
Specifies storage structure, access methods, and consistency constraints.
-
DDL statements are compiled, resulting in a set of tables stored in a special file called data dictionary.
Data dictionary
contains metadata about Database schema, Integrity constraints( Primary Key(主码,可唯一标识实体的编码), Referential integrity) and Authorization.
Data Manipulation Language (DML)
functions:
- Retrieve data from the database
- Insert / delete / update data in the database
DML also known as query language.
classes:
- Procedural DML – user specifies what data is required and how to get those data.
e.g.
C, Pascal, Java, etc. - Nonprocedural DML – user specifies what data is required without specifying how to get those data.
e.g.
SQL, Prolog, etc.
Data Control Language (DCL)
SQL (Structured Query Language )
SQL = DDL+ DML+DCL ,the most widely used non-procedural query language. It’s Set-based, declarative, and procedural extensions are offered by different database systems.
usage:
Use SQL directly in the interactive environment.
e.g.
Query Analyzer of SQL Server; Sql Plus and Work Sheet of Oracle;mysql command line client of MySQL, etc.Use SQL by host language through ODBC / JDBC.
The Open Database Connectivity (ODBC) standard defines application program interfaces for use with C and several other languages.
The Java Database Connectivity (JDBC) standard defines a corresponding interface for the Java language.
Use SQL by host language with embed-SQL.
Database Design
Steps
Step | Contents | Explanation | notes |
---|---|---|---|
1 | Requirement analysis | data, applications, and operations needed | |
2 | Conceptual database design | A high-level description of data, constraints | Entity-Relationship (E-R) model or a similar high level data model. |
3 | Logical database design | Converting the conceptual design into a DB schema | |
4(理论) | Schema refinement | Normalization of relations | Check relational schema for redundancies and related anomalies. |
5(设计) | Physical database design | Indexing, query, clustering, and DB tuning | |
6 | Create and initialize the database | Load initial data and test. | |
7 | Security design | Identify different user groups and their roles. |
Entity-Relationship (E-R) Model of real world
Entities (objects): described by a set of attributes.
Relationships: between entities.
Database design in E-R model is usually converted to design in the relational model .
E-R diagram:
Relational Model
Transfer E-R diagrams into relational schema.
Database Users and Administrators
Database Users
Users are differentiated by the way they expect to interact with the system.
-
Naive users: invoke one of the permanent application programs that have been written previously by a high level language.
e.g.
people accessing database over the web, bank tellers, clerical staff. -
Professional users:
- Application programmers : interact with system via SQL calls.
- Sophisticated users : form requests in a database query language.
e.g.
Online Analytical Processing (OLAP), Data mining. - Specialized users: write specialized database applications that do not fit into the traditional data processing framework.
e.g.
CAD, Expert System (ES), KDB.
Database Administrators (DBA)
A special user having central control over database and programs accessing those data(DB).
characteristics:
- the highest privilege for the database.
- coordinates all the activities of the database system.
- controls all users authority to the database.
- good understanding of the enterprise’s information resources and requirements.
duties:
- Definition of storage structure, access method and schema
- Modification of schema and physical organization
- Granting of authorization for data access
- Routing maintenance
- Monitoring performance and responding to changes in requirements
- Security for the database (e.g. periodically backup database, recovery when failure)
Transaction Management
Concurrent use/access is important, but causes problems/conflict.
A transaction: a collection of operations that performs a single logical function in a database application.
- requirements(ACID): atomicity, consistence, isolation, durability.
- Transaction-management: The component ensures that the database remains in a consistent (or correct) state, although system failures and transaction failures.
e.g.
power failures and operating system crashes - Concurrency-control manager : controls the interaction among the concurrent transactions.
Database Architecture
Storage Manager
A program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
tasks:
- Interaction with the file manager
- Efficient storing, retrieving and updating of data
contents:
-
Transaction manager
-
Authorization and integrity manger
-
File manager (interaction with the file system to process data files, data dictionary, and index files)
-
Buffer manager
Query Processor
Query Processor includes DDL interpreter, DML compiler, and query processing, accomplishing tasks of Parsing and translation, Optimization and Evaluation.
What does Optimizer do?
There are alternative ways of evaluating a given query: Equivalent expressions but different algorithms for each operation. Cost difference between a good and a bad way of evaluating a query can be enormous.
Estimate the cost of operations and take the execution plan is important.
- Depends critically on statistical information about relations which the database must maintain
- Need to estimate statistics for intermediate results to compute cost of complex expressions
Overall System Structure
Application Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which the database is running.
- Centralized
- Client-server
- Parallel (multi-processor)
- Distributed
History of Database Systems
1950s and early 1960s
Data processing using magnetic tapes(only sequential access) for storage, punched cards for input.
Late 1960s and 1970s
Hard disks allowed direct access to data.
Network and hierarchical data models in widespread use.
Ted Codd defines the relational data model
High-performance (for the era) transaction processing
1980s
Research relational prototypes evolve into commercial systems, SQL becomes industrial standard.
Parallel and distributed database systems.
Object-oriented database systems .
1990s
Large decision support and data-mining applications.
Large multi-terabyte data warehouses.
Emergence of Web commerce
Early 2000s:
XML and XQuery standards
Automated database administration
Later 2000s
Giant data storage systems