CO C3 Arithmetic
Introduction
- Computer words are composed of bits;
- words: represented as binary numbers
- there are 32bits/word or 64bits/word in RISC-V
- Contains four bytes for 32bits word
 
- Simplified to contain only in course:
- memory-reference instructions: lw, sw
- arithmetic-logical instructions: add, sub, and, or, xor, slt…
- control flow instructions: beq, bne, jal
- Generic Implementation:
- use the program counter (PC) to supply instruction address
- get the instruction from memory
- read registers
- use the instruction to decide exactly what to do
- All instructions use the ALU after reading the registers
 Why? memory-reference? arithmetic? control flow?
Numbers
Number systems
Representation
Signed Number formats
- Sign and magnitude
- 1’s complement
- 2’s complement
Two’s Biased notation invert all bits and add 1 with end.
types
sign extension
- Expansion e.g8 bit numbers to 64/32 bit numbers;
- Required for operations with registers(32/64 bits) and
 immediate operands (8 bits)
lb,Load byte: Take the lower 8 bits as they are and copy the highest bit to the remaining 24/56 bits.
e.g.0000 0010 0000 0000 0000 0000 0000 0000 0000 0010
1111 1110 1111 1111 1111 1111 1111 1111 1111 0010
lbu,Load byte unsigned: Take the lower 8 bits as they are and copy the do zero-extension to the remaining 24/56 bits.
e.g.0000 0010 0000 0000 0000 0000 0000 0000 0000 0010
1111 1110 0000 0000 0000 0000 0000 0000 1111 1110
Arithmetic
Addition and Subtraction
Adding bit by bit and take carries to next digit. Simply use addition of 2’s complement for Subtraction.
Overflow/underflowThe sum of two numbers can exceed any representation, may happens when positve + positive, positve - negative, negative - negative or negative - positive.Jugde by the
double sign bits
Operation Operand A Operand B Result overflow double sign bits Reaction
- An exception (interrupt) occurs
- Control jumps to predefined address for exception
- Interrupted address is saved for possible resumption
Different language treats overflow differently.
- Signaling to application (Ada, Fortran)
- Ignore, don’t always want to detect overflow©
Comparison If rs < rt, rd=1, else rd=0
slt/slturd,rs,rtProcess
- Hardware detection in the ALU and Generation of an exception (interrupt)
- Save the instruction address (not PC) in special register(byte addressing!) SEPC
- Jump to specific routine in OS (take instruction in SEPC and take back in PC)
- Correct & return to program
- Return to program with error code
- Abort program
Arithmetic for Multimedia
- 
Graphics and media processing operates on vectors of 8-bit and 16-bit data - Use 64-bit adder, with partitioned carry chain
- Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors
- SIMD (single-instruction, multiple-data)
 
- 
Saturating operations 
 overflow: result is largest representable valuec.f2s-complement modulo arithmetic
 e.gclipping in audio, saturation in video
Logical operations
| 1 |  | 
Logical shift filled with ‘0’
slli/srliregister1, register2, numbit-wise AND between registers
andregister1, register2, register3bit-wise OR between registers
orregister1, register2, register3
Constructing a simple ALU
A half adder
A full adder
Extended 1 bit ALU
AND,OR, ADD
Substraction: Inverting b and set 1st CarryIn= 1.
Comparison: Subtraction (rs - rt), if the result is negative→ rs < rt Use of sign bit as indicator
Most significant bit  Set for comparison  Overflow detect
 Cell
Cascade Element
Basic 32 bit ALU
- Inputs parallel
- Carry is cascaded: Ripple carry adder (行波进位加法器)
- Slow but simple
- 1st Carry In = 0
 
Multiplication
Binary multiplication
At current bit position
- If multiplier is 1, then add multiplicand Else add 0
- shift multiplicand left by 1 bit
Version 1
As there are at most bits when multiply bit and bit number.

Deawbacks: Very big, Too slow!
- For a 64-bit ALU, the multiplicant should be-128 bit.
- Requires 64 iterations :Addition, Shift and Comparison
- Almost 200 cycles
Version 2
Least significant bits of the product don’t change, hence Shift the product and the multiplier instead of the multiplicand !
For a 64-bit ALU, it is reduced to 64 bits, only left half of product register is changed.
- Addition performed only on left half of product register
- Shift of product register
Version 3
The product register contains only 0 at the initial state. And the lower 64 bits are simply shifted out Therefore use these lower 64 bits for the multiplier.
- Set product register to 0
- Load lower bits of product register with multiplier
- Test least significant bit of product register
Signed multiplication
- 
Basic approach: - Store the signs of the operands  Convert signed numbers to unsigned numbers (most significant bit (MSB) = 0) Perform multiplication  If sign bits of operands are equal sign bit = 0, else sign bit = 1
 
- 
Improved method: Booth’s Algorithm Idea: If you have a sequence of '1’s  subtract at first ‘1’ in multiplier  shift for the sequence of '1’s 
  add where prior step had last '1‘Result: Possibly less additions and more shifts  Faster, if shifts are faster than additions last1 last2 Operations 1 0 subtract multiplicand from left 1 1 no arithmetic operation-shift 0 1 add multiplicand to left half 0 0 no arithmetic operation-shift Advantages: - keeps the leftmost bit constant
- no change of sign bit !
 
Faster Multiplication
Uses multiple adders
RISC-V Multiplication
Division
Binary division
- Check for 0 divisor
- Long division approach ₠ If divisor ≤ dividend bits ☉ 1 bit in quotient, subtract
 ₠ Otherwise
 ☉ 0 bit in quotient, bring down next dividend bit
- Restoring division ₠ Do the subtract, and if remainder goes < 0, add divisor back
- Signed division ₠ Divide using absolute values ₠ Adjust sign of quotient and remainder
 as required
Version 1
Version 2
- Reduction of Divisor and ALU width by half
- Shifting of the remainder
- Saving 1 iteration
- Remainder register keeps quotient - No quotient register required
Faster Division
- Can’t use parallel hardware as in multiplier as Subtraction is conditional on sign of remainder
- Faster dividers (e.g. SRT division) generate multiple quotient bits per step but Still require multiple steps
RISC-V Division
Floating point arithmetic
Adding all parts to get an ALU