## Statistical Analysis and Optimization of Asynchronous Digital Circuits



Tsung-Te Liu and Jan M. Rabaey

University of California, Berkeley

## Outline

- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion

## Variability Continues to Increase as Technology and Voltage Scales Down



Higher variability with finer design rules and larger wafers
Higher variability with lower supply voltages

# Circuit Performance Characteristics with Different Timing Schemes



- Self-timed circuit is a variation-monitoring circuit by itself
- Becomes advantageous when the variation is large (B>A)
- Statistical analysis framework is necessary

#### **Statistical Analysis Framework**

#### **Circuit Variability Model**

- Supply voltage
- Logic depth
- Width and length
- Body bias

#### Performance Model

- Computation overhead
- Communication overhead
- Delay and energy performance



## Outline

• Motivation

#### Variability model of CMOS digital circuit

- Performance model for different timing schemes
- Performance comparison
- Conclusion

#### **Delay Model of CMOS Digital Circuit**



One unified current model across different operating regions
Model error <2% from 0.3V to 1V</li>

#### **Delay Variability Model**



#### **Delay Variability Model**



- Model error <8% from 0.3V to 1V
- Local mismatch dominates at low supply voltages

## Delay Variability Model with Different Logic Depths



Use 4-stage inverter chain model as baseline model
Model error <13% for n=8 and <15% for n=24</li>

## Outline

- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion

#### **Delay Overhead Evaluation**



- Assumption: Process variation follows Gaussian distribution
- Dual-rail approach: have only protocol overhead but no delay overhead
- Synchronous approach: have only delay overhead

For 99.7% yield: 
$$D_{sync} = \frac{3\sigma_{logic,total}}{\mu_{logic,total}}$$

#### **Bundled-Data Self-Timed Approach**



Assume main data path and replica delay line exhibit similar statistics:

For 99.7% yield: 
$$D_{bundled-data} = D_{variation}^{2} \cdot \left(0.5 + \sqrt{0.25 + \frac{2}{D_{variation}^{2}}}\right)$$
  
where  $D_{bundled-data} = \frac{\mu_{delay-line} - \mu_{logic}}{\mu_{logic}}$   $D_{variation} = \frac{3\sigma_{logic,WID}}{\mu_{logic,WID}}$  13

#### **Bundled-Data Delay Overhead**



## Performance Model under Variations

| Original delay and energy mode                                                           | Statistical delay and energy model                                                                                     |
|------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
| T <sub>comp</sub> = T <sub>delay</sub>                                                   | T <sub>comp</sub> = T <sub>delay</sub> (1+P+D)                                                                         |
| $E_{dynamic} = \alpha C_{switch} V^2$                                                    | $E_{dynamic} = \alpha C_{switch} (1+P)V^2$                                                                             |
| E <sub>leakage</sub> =VI <sub>leakage</sub> T <sub>delay</sub>                           | $E_{\text{leakage}} = VI_{\text{leakage}} (1+P)T_{\text{delay}} (1+P+D)$                                               |
| E <sub>total</sub> =αC <sub>switch</sub> V²<br>+VI <sub>leakage</sub> T <sub>delay</sub> | E <sub>total</sub> =αC <sub>switch</sub> (1+P)V <sup>2</sup><br>+VI <sub>leakage</sub> (1+P)T <sub>delay</sub> (1+P+D) |

| Timing scheme         | Synchronous       | Bundled-Data              | Dual-Rail              |
|-----------------------|-------------------|---------------------------|------------------------|
| Delay Overhead (D)    | D <sub>sync</sub> | D <sub>bundled-data</sub> | 0                      |
| Protocol Overhead (P) | 0                 | P <sub>bundled-data</sub> | P <sub>dual-rail</sub> |

- Evaluate computation delay and energy under variations
- Overhead changes with supply voltage and logic depth

## Outline

- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion

#### **Delay Overhead Comparison**



- Global variation affects only synchronous approach
- Local mismatch dominates at low supply voltages
- Local mismatch has less impact on longer critical path

#### **Speed Performance Comparison**



- Assumption:  $P_{bundled-data} = 1T_{FO4}$ ;  $P_{dual-rail} = 2T_{FO4}$
- Synchronous scheme is better for small critical path at high supply voltages
- Dual-rail scheme is better for large critical path at low supply voltages

#### **Energy Performance Comparison**



- Synchronous scheme is better for high activity at high supply voltages
- Dual-rail scheme is better for low activity at low supply voltages
- Leakage dominates for low activity at low supply voltages

#### Conclusion

- A statistical analysis framework is proposed to evaluate performance of CMOS digital circuit in the presence of process variations.
- Designer can efficiently determine the optimal timing strategy, pipeline depth and supply voltage based on the proposed variability and statistical performance models.
- Asynchronous design exhibits better energy and delay characteristics for circuits with low activity and larger critical path delay under process variations

#### Acknowledgement

- Berkeley Wireless Research Center
- NSF Infrastructure Grant
- STMicroelectronics
- Multiscale System Center

## Thank you!