CO C0,1 Introduction

Course Intro

Kernel

  • How does Hardware support HLL?
  • Arithmetic for Computers
  • Datapath and Control
  • Exploiting Memory Hierarchy
  • I/O, Networks, and Other Peripherals
  • Experiments

Course Calendar

Grading Policy

Block Contents Percentage
Theory Homework + Quiz 20
Midterm test 10
Final 40
Lab Lab assignments 30

Computer Abstractions and Technology

Introduction

  • Progress in computer technology: Underpinned by Moore’s Law
  • Makes novel applications feasible: Computers in automobiles, Cell phones, Human genome project, World Wide Web,Search Engines……
  • Computers are pervasive

Present and Tomorrow

Classes of Computers

Personal computers(PC) Server computers Supercomputers Embedded computers
General purpose, variety of software Network based High-end scientific and engineering calculations Hidden as components of systems
Subject to cost/performance tradeoff High capacity, performance, reliability Highest capability but represent a small fraction of the overall computer market Stringent power/performance/cost constraints
Range from small servers to building sized

Performance

Component How
Algorithm number of source-level statements and I/O operations
Programming language, complier and architecture number of machine instructions for each statement
Processor and memory system the speed of execution of instructions
I/O system(Hardware and operating system) the speed of execution of I/O operations

Great ideas

  • Design for Moore’s Law:

    The integrate circuit resource double every 18-24 months.

  • Abstraction to Simplify Design

    Lower-level details are hidden to higher levels. Instruction set architecture(ISA) is the interface between hardware and lowest-level software.

  • Make the common case fast

  • Performance via Parallelism(并行), Pipelining(流水线), Prediction(预测加速执行)

  • Hierarchy of Memory

  • Dependability via Redundancy

    Since any physical device can fail, we make systems dependable by including redundant components that
    can take over when a failure occurs and to help detect failures.

Below Your Program

A simplified view of Hardware and software as hierarchical layers:

Layers
Application software aimed at users
System software aimed at programmers;
including Operation System (Virtual memory, File system, IO device drivers), Compiler and Assembler
Hard ware
  • Computer language

    Language Description
    Machine language Computers only understands electrical signals
    Assembly language Symbolic notations, translated into machine instruction by assembler.
    High-level programming language Programs can be independent of hardware.
    Translated into assembly language statements by compiler.

    From a High-Level Language to the Language of Hardware, a process of compiling and assembling.

Computer Organization and Hardware System

Decomposability

Display

Type Description Note
CRT (raster cathode ray tube) Scan an image one line at a time.
Pixels and the bit map.
The more bits per pixel, the more colors to be displayed
LCD (liquid crystal display) Thin and low-power.
LCD is not the source of light, but to form a twisting helix that bends light entering.
Use an active matrix that has a tiny transistor switch at each pixel to control current precisely and make sharper images.

pixel The smallest individual picture element. Screens are composed of hundreds of thousands to millions of pixels, organized in a matrix.

The display principle
  • A color display might use 8 bits for each of the three colors (red, blue, and green), for 24 bits per pixel.

  • Hardware support for graphics consists mainly of a raster refresh buffer (frame buffer) to store bit map.

  • The image to be represented onscreen is stored in the frame buffer, and the bit pattern per pixel is read out to the graphics display at the refresh rate.

  • The Goal of bit map is to faithfully represent what is on the screen.

Motherboard

Thin, green, plastic, covered with dozens of small rectangles which contain integrated circuits (chips). Three pieces:

  1. the piece connecting to the I/O devices
  2. memory:]
  3. processor
CPU / Processor
Components Descritions
Datapath performs arithmetic operations.
Control commands the datapath, memory, and I/O devices according to the instructions of the program
Cache memory Small fast SRAM, acts as a buffer
Memory

main/primary memory

  • Place to keep running programs and data needed.
  • DRAM chips.

The main memory is volatile, it forgets when it loses power.

secondary memory

  • store programs and data between runs.
  • flash memory in PMDs and magnetic disks in servers.

It is nonvolatile, it retains data even in the absence of a power source.

e.g.

  • Magnetic disk(Floppy disk, Hard disk)
  • CD (optical compact disk)
  • Magnetic tape
  • Solid state memory

Network

Networks interconnecting whole computers allows computer users to extend the power of computing by including communication.

Advantages

  • Communication: Information is exchanged between computers at high speeds.
  • Resource sharing: Computers on the network can share I/O devices.
  • Nonlocal access: Users need not be near the computer they are using.
Local area network(LAN) / Ethernet

A network designed to carry data within a geographically confined area. Also LAN are interconnected with switches that can also provide routing services and security.

Wide area network (WAN) / Internet

A network extended over hundreds of kilometers that can
span a continent. Typically based on optical fibers and are leased from telecommunication companies.

Wireless network: WiFi, Bluetooth

All users in an immediate area share the airwaves.

Integrated Circuits

transistor An on/off switch controlled by an
electric signal.

IC a chip that combined dozens to hundreds of transistors.

Electronics technology continues to evolve:

  • Increased capacity and performance
  • Reduced cost
manufacturing process

IC cost

A single microscopic flaw in the wafer itself or in one of the dozens of patterning steps can result in that area of the wafer failing.

These defects make it virtually impossible to manufacture a perfect wafer.

The simplest way to cope with imperfection is to place many independent components on a single wafer. The patterned wafer is then chopped up, or diced, into these components called dies / chips.

die/chip The individual rectangular sections that are cut from a wafer.

yield The percentage of good dies from the total number of dies on the wafer.

CPD=CPWNDPW×yieldCPD = \frac{CPW}{N_{DPW} \times yield}

NDPWAwAdN_{DPW} \approx \frac{A_{w}}{A_d}

yield=1(1+(NDPA×Ad/2))2yield = \frac{1}{(1 + (N_{DPA} \times A_d / 2))^2}

  • CPDCPD, Cost per die.
  • CPWCPW, Cost per wafer.
  • NDPWN_{DPW}, Dies per wafer.
  • AwA_{w} and AdA_d, Wafer area and Die area.
  • NDPAN_{DPA}, Defects per area.

NOTENonlinear relation to area and defect rate.

  • Wafer cost and area are fixed
  • Defect rate determined by manufacturing process
  • Die area determined by architecture and circuit design

Performance

response time / execution time : The total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on.

throughput / bandwidth : Another measure of performance, it is the number of tasks completed per unit time.

response time(most cases) throughput
need different performance metrics servers
need different sets of applications to benchmark personal mobile devices

Decreasing response time almost always improves throughput.

PerformanceX=1ExecutionTimeX\rm Performance_X = \frac{1}{ExecutionTime_X}

Use the phrase “X is n times faster than Y”/"X is n times as fast as Y" to mean

PerformanceXPerformanceY=ExecutionTimeYExecutionTimeX=n\rm \frac{Performance_X}{Performance_Y} = \frac{ ExecutionTime_Y}{ExecutionTime_X} = \it n

nn is Relative Performance.

Measuring Execution Time
  • Elapsed time :
    Total response time, including all aspects like processing, I/O, OS overhead, idle time ……
    It determines system performance.

  • CPU time: The actual time the CPU spends processing a given job, discounting I/O time, other jobs’ shares.

    • user CPU time The CPU time spent in a program itself.

    • system CPU time The CPU time spent in the operating system performing tasks on behalf of the program.

    Different programs are affected differently by CPU performance and system performance.

    We will use the term system performance to refer to elapsed time on an unloaded system and CPU performance to refer to user CPU time.

  • CPU Clocking

    Operation of digital hardware governed by a constant-rate clock.

    (clock) tick, (clock) cycle: The time for one clock period, usually of the processor clock running at a constant rate.

    Clock period: duration of a clock cycle.

    Clock frequency (rate): cycles per second.

CPU Performance and Factors

CPUTime=CPUClockCycles×ClockPeriod=CPUClockCyclesClockRate\begin{align} \rm CPU\quad Time &= CPU\quad ClockCycles\times ClockPeriod\\ &= \frac{CPU\quad ClockCycles}{ClockRate} \end{align}

Performance improved by :

  • Reducing number of clock cycles
  • Increasing clock rate
  • But hardware designer must often trade off clock rate against cycle count
Instruction Performance

Clock cycles per instruction (CPI) Average number of clock cycles per instruction for a program or program fragment.

CPUClockCycles=InstructionsCount×CPI=i=1nCPIi×InstructionsCounti\begin{align} \rm CPU\quad ClockCycles &= \rm InstructionsCount \times CPI\\ & = \sum_{i = 1}^{n}{\rm CPI_i\times InstructionsCount_i} \end{align}

CPUTime=InstructionsCount×CPI×ClockPeriod=InstructionsCount×CPIClockRate=IC×CPI×Tc\begin{align} \rm CPU\quad Time &= \rm InstructionsCount \times CPI\times ClockPeriod\\ &= \frac{\rm InstructionsCount \times CPI}{ClockRate}\\ &= IC \times CPI \times T_c \end{align}

  • Instruction Count(IC) for a program: Determined by program, ISA and compiler

  • Average cycles per instruction(CPI) : Determined by CPU hardware.
    If different instructions have different CPI : Average CPI affected by instruction mix.

    Must point out that CPI and Clock period have influences on each other.

CPI=CPUClockCyclesInstructionsCount=i=1nCPIi×InstructionsCountiInstructionsCount\rm CPI = \frac{CPU\quad ClockCycles}{InstructionsCount} = \sum_{i = 1}^{n}{\rm CPI_i\times \frac{InstructionsCount_i}{InstructionsCount}}

Performance depends on

  • Algorithm: affects IC, possibly CPI
  • Programming language: affects IC, CPI
  • Compiler: affects IC, CPI
  • Instruction set architecture(ISA): affects IC, CPI, TcT_c
Performance improvement
  • Power Wall:

    Power efficiency decrease • the trend of consuming double the power with each doubling of operating frequency
    •Hot‐spot
    • Power leakage

    In CMOS(complementary metal oxide semiconductor) tech,

    Power=CapacitiveLoad×Volt2×Frequency\rm Power = CapacitiveLoad\times Volt^2\times Frequency

    However, we can’t reduce voltage further and remove more heat, which causes Power Wall.

    Relative Power

  • Memory Wall:

    There is memory gap. L2 cache is getting larger with less performance contribution.

  • ILPWall:

    The increasing difficulty to find enough parallelism in the instructions stream of a single process to keep higher performance processor cores busy.

    little ILP left to exploit due to power dissipation and memory gap.

Multiprocessors

  • Multicore microprocessor: More than one processor per chip
  • Requires explicitly parallel programming
    • Compare with instruction level parallelism
      • Hardware executes multiple instructions at once
      • Hidden from the programmer
    • Hard to do !
      • Programming for performance
      • Load balancing
      • Optimizing communication and synchronization

Pitfalls

Amdahl’s Law

Improving an aspect of a computer doesn’t mean expecting a proportional improvement in overall performance.

Timprove=Taffectn+TunaffectT_{improve} = \frac{T_{affect}}{n} + T_{unaffect}

where nn is the improvement factor.

Not Low Power at Idle
MIPS as a Performance Metric

MIPS: Millions of Instructions Per Second ₠ Doesn’t account for ☉Differences in ISAs between computers
☉Differences in complexity between instructions


CO C0,1 Introduction
http://example.com/2023/03/05/CO-0,1/
Author
Tekhne Chen
Posted on
March 5, 2023
Licensed under