# CSE477 VLSI Digital Circuits Fall 2002

# Lecture 25: Peripheral Memory Circuits

Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477

[Adapted from Rabaey's Digital Integrated Circuits, ©2002, J. Rabaey et al.]

Irwin&Vijay, PSU, 2002

# **Review: Read-Write Memories (RAMs)**

#### Static – SRAM

- data is stored as long as supply is applied
- large cells (6 fets/cell) so fewer bits/chip
- fast so used where speed is important (e.g., caches)
- differential outputs (output BL and !BL)
- use sense amps for performance
- compatible with CMOS technology

#### Dynamic – DRAM

- periodic refresh required
- small cells (1 to 3 fets/cell) so more bits/chip
- slower so used for main memories
- single ended output (output BL only)
- need sense amps for correct operation
- not typically compatible with CMOS technology

# **Review: 4x4 SRAM Memory**



# **Peripheral Memory Circuitry**

Row and column decoders

Sense amplifiers

Read/write circuitry

Timing and control

#### **Row Decoders**

- □ Collection of 2<sup>M</sup> complex logic gates organized in a regular, dense fashion
- (N)AND decoder

 $WL(0) = !A_9!A_8!A_7!A_6!A_5!A_4!A_3!A_2!A_1!A_0$ 

$$WL(511) = !A_9A_8A_7A_6A_5A_4A_3A_2A_1A_0$$

. . .

NOR decoder

 $WL(0) = !(A_9 + A_8 + A_7 + A_6 + A_5 + A_4 + A_3 + A_2 + A_1 + A_0)$ 

- - -

 $WL(511) = !(A_9 + !A_8 + !A_7 + !A_6 + !A_5 + !A_4 + !A_3 + !A_2 + !A_1 + !A_0)$ 

# **Dynamic NOR Row Decoder**



# **Dynamic NAND Row Decoder**



# **Split Row Decoder**



# Pass Transistor Based Column Decoder



- Advantage: speed since there is only one extra transistor in the signal path
- Disadvantage: large transistor count

# **Tree Based Column Decoder**



Advantage: number of transistors drastically reduced

- Disadvantage: delay increases quadratically with the number of sections (so prohibitive for large decoders)
  - fix with buffers, progressive sizing, combination of tree and pass transistor approaches

# **Bit Line Precharging**



**Clocked Precharge** 



equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line

## **Sense Amplifiers**



□ Use sense amplifiers (SA) to amplify the small swing on the bit lines to the full rail-to-rail swing needed at the output



# **Latch Based Sense Amplifier**



## **Alpha Differential Amplifier/Latch**



# **Read/Write Circuitry**



D: data (write) bus R: read bus W: write signal CS: column select (column decoder)

Local W (write): BL = D, !BL = !D enabled by W & CS Local R (read): R = BL, !R = !BL enabled by !W & CS

#### **Approaches to Memory Timing**



### **SRAM Address Transition Detection**



Irwin&Vijay, PSU, 2002

# **DRAM Timing**



# **Review: A Typical Memory Hierarchy**

- By taking advantage of the principle of locality:
  - Present the user with as much memory as is available in the cheapest technology.
  - Provide access at the speed offered by the fastest technology.



# **Caches**

Address issued by the data path has to be mapped (in hardware) into a cache address

tag which word in the cache block which cache block in the set

Cache block (aka line) – unit of read/write information in cache

#### Cache mapping strategies

- Direct mapped
  - A word can be in only one block in the cache, so only have to compare its tag against that block's tag
- Block set associative
  - A word can be in two (or four or eight ...), so have to compare its tag against the tags of those two (or four or eight ...) cache blocks
- Fully associative

- A word can be in any block in the cache, so have to compare its tag against the tags of all of the blocks in the cache

# **Two-Way Block Set Associative Cache**



# **Translation Lookaside Buffers (TLBs)**

- Small caches used to speed up address translation in processors with virtual memory
- All addresses have to be translated before cache access
- □ I\$ can be virtually indexed/virtually tagged

# **TLB Structure**

Address issued by CPU (page size = index bits + byte select bits)







CSE477 L25 Memory Peripheral.24

Irwin&Vijay, PSU, 2002

# **Reliability and Yield**

Semiconductor memories trade-off noise margin for density and performance

Thus, they are highly sensitive to noise (cross talk, supply noise)

High density and large die size causes yield problems

Yield = 100 # of good chips/wafer # of chips/wafer

$$Y = [(1 - e^{-AD})/(AD)]^2$$

Increase yield using error correction and redundancy

### **Alpha Particles**



1 particle ~ 1 million carriers

**Yield** 



Yield curves at different stages of process maturity (from [Veendrick92])

Irwin&Vijay, PSU, 2002

# **Redundancy in the Memory Structure**



# **Redundancy and Error Correction**



# **Next Lecture and Reminders**

#### Next lecture

- System level interconnect
  - Reading assignment Rabaey, et al, xx
- Reminders
  - Project final reports due December 5<sup>th</sup>
  - Final grading negotiations/correction (except for the final exam) must be concluded by December 10<sup>th</sup>
  - Final exam scheduled
    - Monday, December 16<sup>th</sup> from 10:10 to noon in 118 and 121 Thomas