DB C9 Query Processing

  • CPU can’t manipulate the data in disk directly.

  • The growth rate of the disk data transfer(read/write) speed is much slower than that of the disk size.

    In 20 years, disk size: 1000 times; read/write speed: 40 times

  • The growth rate of the disk seek speed is much slower than that of the disk data transfer speed.

Overview

Basic Steps in Query Processing

  1. Parsing and translation:

    • Parser checks syntax, verifies relations.
    • Translate the query into the internal form: extended relational algebra (ERA).
  2. Optimization:

    Why optimization?

    1. For a given SQL query, there may be many equivalent relational algebra expressions.
    2. Each relational algebra operation can be evaluated using one of several different algorithms. e.g linear search or scan
    • Find out various equivalent relational algebra expressions.
    • A sequence of primitive operations which specify detailed evaluation strategy is called a query-execution/evaluation plan.
    • Query Optimization: Among all equivalent evaluation plans choose the one with the lowest cost. Two factors are considered for cost:
      • Cost is dependent on the algorithms of the executions.
      • Cost is also estimated using statistical information from the database catalog.
  3. Evaluation:

    • The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.

      evaluation plan: Annotated expression specifying detailed evaluation strategy.

      defines exactly

      • what algorithm is used for each operation;
      • how the execution of the operations is coordinated.

Measures of Query Cost

Cost is generally measured as total elapsed time for answering query. Factors are:

  • Disk access

  • CPU

  • Network communication

  • ……

Consider these factors under actual circumstances.

Disk access

Typically disk access is the predominant cost, and is relatively easy to estimate.

disk access is measured by taking into account:

  • Number of seek operations performed

  • Number of blocks read ×\times average-block-read-cost

  • Number of blocks written ×\times average-block-write-cost

    Costwrite>CostreadCost_{write} > Cost_ {read} :

    Data is read back after being written to ensure that the write was successful

Simplified measure

We just use the ①number of block transfers from disk and ②the number of seeks as the cost measures.

  • tTt_T: time to transfer one block. (≈ 0.1ms)
  • tSt_S: time for one seek. (≈ 4ms)

Selection Operation

Basic algorithms

File scan

Search algorithms that locate and retrieve records satisfying a selection condition, do not use index.

Algorithm A1 (linear search).

*Sorting

Join Operation

*Other Operations

*Evaluation of Expressions


DB C9 Query Processing
http://example.com/2023/05/15/DB-09/
Author
Tekhne Chen
Posted on
May 15, 2023
Licensed under