CO C0,1 Introduction
Course Intro
Kernel
- How does Hardware support HLL?
- Arithmetic for Computers
- Datapath and Control
- Exploiting Memory Hierarchy
- I/O, Networks, and Other Peripherals
- Experiments
Course Calendar
Grading Policy
Block | Contents | Percentage |
---|---|---|
Theory | Homework + Quiz | 20 |
Midterm test | 10 | |
Final | 40 | |
Lab | Lab assignments | 30 |
Computer Abstractions and Technology
Introduction
- Progress in computer technology: Underpinned by Moore’s Law
- Makes novel applications feasible: Computers in automobiles, Cell phones, Human genome project, World Wide Web,Search Engines……
- Computers are pervasive
Present and Tomorrow
Classes of Computers
Personal computers(PC) | Server computers | Supercomputers | Embedded computers |
---|---|---|---|
General purpose, variety of software | Network based | High-end scientific and engineering calculations | Hidden as components of systems |
Subject to cost/performance tradeoff | High capacity, performance, reliability | Highest capability but represent a small fraction of the overall computer market | Stringent power/performance/cost constraints |
Range from small servers to building sized |
Performance
Component | How |
---|---|
Algorithm | number of source-level statements and I/O operations |
Programming language, complier and architecture | number of machine instructions for each statement |
Processor and memory system | the speed of execution of instructions |
I/O system(Hardware and operating system) | the speed of execution of I/O operations |
Great ideas
-
Design for Moore’s Law:
The integrate circuit resource double every 18-24 months.
-
Abstraction to Simplify Design
Lower-level details are hidden to higher levels. Instruction set architecture(ISA) is the interface between hardware and lowest-level software.
-
Make the common case fast
-
Performance via Parallelism(并行), Pipelining(流水线), Prediction(预测加速执行)
-
Hierarchy of Memory
-
Dependability via Redundancy
Since any physical device can fail, we make systems dependable by including redundant components that
can take over when a failure occurs and to help detect failures.
Below Your Program
A simplified view of Hardware and software as hierarchical layers:
Layers | |
---|---|
Application software | aimed at users |
System software | aimed at programmers; including Operation System (Virtual memory, File system, IO device drivers), Compiler and Assembler |
Hard ware |
-
Computer language
Language Description Machine language Computers only understands electrical signals Assembly language Symbolic notations, translated into machine instruction by assembler. High-level programming language Programs can be independent of hardware.
Translated into assembly language statements by compiler.From a High-Level Language to the Language of Hardware, a process of compiling and assembling.
Computer Organization and Hardware System
Decomposability
Display
Type | Description | Note |
---|---|---|
CRT (raster cathode ray tube) | Scan an image one line at a time. Pixels and the bit map. |
The more bits per pixel, the more colors to be displayed |
LCD (liquid crystal display) | Thin and low-power. LCD is not the source of light, but to form a twisting helix that bends light entering. |
Use an active matrix that has a tiny transistor switch at each pixel to control current precisely and make sharper images. |
pixel
The smallest individual picture element. Screens are composed of hundreds of thousands to millions of pixels, organized in a matrix.
The display principle
A color display might use 8 bits for each of the three colors (red, blue, and green), for 24 bits per pixel.
Hardware support for graphics consists mainly of a raster refresh buffer (frame buffer) to store bit map.
The image to be represented onscreen is stored in the frame buffer, and the bit pattern per pixel is read out to the graphics display at the
refresh rate
.The Goal of bit map is to faithfully represent what is on the screen.
Motherboard
Thin, green, plastic, covered with dozens of small rectangles which contain integrated circuits (chips). Three pieces:
- the piece connecting to the I/O devices
- memory:]
- processor
CPU / Processor
Components | Descritions |
---|---|
Datapath | performs arithmetic operations. |
Control | commands the datapath, memory, and I/O devices according to the instructions of the program |
Cache memory | Small fast SRAM, acts as a buffer |
Memory
main/primary memory
- Place to keep running programs and data needed.
- DRAM chips.
The main memory is volatile, it forgets when it loses power.
secondary memory
- store programs and data between runs.
- flash memory in PMDs and magnetic disks in servers.
It is nonvolatile, it retains data even in the absence of a power source.
e.g.
- Magnetic disk(Floppy disk, Hard disk)
- CD (optical compact disk)
- Magnetic tape
- Solid state memory
Network
Networks interconnecting whole computers allows computer users to extend the power of computing by including communication.
Advantages
- Communication: Information is exchanged between computers at high speeds.
- Resource sharing: Computers on the network can share I/O devices.
- Nonlocal access: Users need not be near the computer they are using.
Local area network(LAN) / Ethernet
A network designed to carry data within a geographically confined area. Also LAN are interconnected with switches that can also provide routing services and security.
Wide area network (WAN) / Internet
A network extended over hundreds of kilometers that can
span a continent. Typically based on optical fibers and are leased from telecommunication companies.
Wireless network: WiFi, Bluetooth
All users in an immediate area share the airwaves.
Integrated Circuits
transistor
An on/off switch controlled by an
electric signal.
IC
a chip that combined dozens to hundreds of transistors.
Electronics technology continues to evolve:
- Increased capacity and performance
- Reduced cost
manufacturing process
IC cost
A single microscopic flaw in the wafer itself or in one of the dozens of patterning steps can result in that area of the wafer failing.
These defects make it virtually impossible to manufacture a perfect wafer.
The simplest way to cope with imperfection is to place many independent components on a single wafer. The patterned wafer is then chopped up, or diced
, into these components called dies
/ chips
.
die/chip
The individual rectangular sections that are cut from a wafer.
yield
The percentage of good dies from the total number of dies on the wafer.
- , Cost per die.
- , Cost per wafer.
- , Dies per wafer.
- and , Wafer area and Die area.
- , Defects per area.
NOTE
Nonlinear relation to area and defect rate.
- Wafer cost and area are fixed
- Defect rate determined by manufacturing process
- Die area determined by architecture and circuit design
Performance
response time / execution time : The total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on.
throughput / bandwidth : Another measure of performance, it is the number of tasks completed per unit time.
response time(most cases) | throughput |
---|---|
need different performance metrics | servers |
need different sets of applications to benchmark personal mobile devices |
Decreasing response time almost always improves throughput.
Use the phrase “X is n times faster than Y”/"X is n times as fast as Y" to mean
is Relative Performance.
Measuring Execution Time
-
Elapsed time :
Total response time, including all aspects like processing, I/O, OS overhead, idle time ……
It determines system performance. -
CPU time: The actual time the CPU spends processing a given job, discounting I/O time, other jobs’ shares.
-
user CPU time The CPU time spent in a program itself.
-
system CPU time The CPU time spent in the operating system performing tasks on behalf of the program.
Different programs are affected differently by CPU performance and system performance.
We will use the term
system performance
to refer toelapsed time
on an unloaded system andCPU performance
to refer touser CPU time
. -
-
CPU Clocking
Operation of digital hardware governed by a constant-rate clock.
(clock) tick, (clock) cycle: The time for one clock period, usually of the processor clock running at a constant rate.
Clock period: duration of a clock cycle.
Clock frequency (rate): cycles per second.
CPU Performance and Factors
Performance improved by :
- Reducing number of clock cycles
- Increasing clock rate
- But hardware designer must often trade off clock rate against cycle count
Instruction Performance
Clock cycles per instruction (CPI)
Average number of clock cycles per instruction for a program or program fragment.
-
Instruction Count(IC) for a program: Determined by program, ISA and compiler
-
Average cycles per instruction(CPI) : Determined by CPU hardware.
If different instructions have different CPI : Average CPI affected by instruction mix.Must point out that CPI and Clock period have influences on each other.
Performance depends on
- Algorithm: affects IC, possibly CPI
- Programming language: affects IC, CPI
- Compiler: affects IC, CPI
- Instruction set architecture(ISA): affects IC, CPI,
Performance improvement
-
Power Wall:
Power efficiency decrease • the trend of consuming double the power with each doubling of operating frequency
•Hot‐spot
• Power leakageIn CMOS(complementary metal oxide semiconductor) tech,
However, we can’t reduce voltage further and remove more heat, which causes Power Wall.
Relative Power
-
Memory Wall:
There is memory gap. L2 cache is getting larger with less performance contribution.
-
ILPWall:
The increasing difficulty to find enough parallelism in the instructions stream of a single process to keep higher performance processor cores busy.
little ILP left to exploit due to power dissipation and memory gap.
Multiprocessors
- Multicore microprocessor: More than one processor per chip
- Requires explicitly parallel programming
- Compare with instruction level parallelism
- Hardware executes multiple instructions at once
- Hidden from the programmer
- Hard to do !
- Programming for performance
- Load balancing
- Optimizing communication and synchronization
- Compare with instruction level parallelism
Pitfalls
Amdahl’s Law
Improving an aspect of a computer doesn’t mean expecting a proportional improvement in overall performance.
where is the improvement factor.
Not Low Power at Idle
MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second ₠ Doesn’t account for ☉Differences in ISAs between computers
☉Differences in complexity between instructions