Lecture 26
Design For Test (DFT)

Xuan ‘Silvia’ Zhang
Washington University in St. Louis

ESE 566A: Modern System-on-Chip Design
ese.wustl.edu/~xuan.zhang/ese566
ASIC Test

• **Two Stages**
  - Wafer test, one die at a time, using probe card
    • production tester applies signals generated by a test program (test vectors) and measures the ASIC test response.
    • either the customer, or the ASIC manufacture, or both, develops the test program
  - Final test, after packaging, board level

• **Failure Analysis**
  - Determine the failure mechanism
  - Due to the soldering process, electrostatic damage during handling, or others between shipping and testing
  - If the problem is from ASIC fabrication, the test program may be inadequate
  - Board level failure field repair are expensive
Importance of Test

- Defect level is used to measure product quality
  - 10 defective chips in 100,000 => defect level is 0.1 percent or 100ppm
  - Average quality level (AQL) = 1 - defect level

<table>
<thead>
<tr>
<th>ASIC defect level</th>
<th>Defective ASICs</th>
<th>Total PCB repair cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>5%</td>
<td>5000</td>
<td>$1 million</td>
</tr>
<tr>
<td>1%</td>
<td>1000</td>
<td>$200,000</td>
</tr>
<tr>
<td>0.1%</td>
<td>100</td>
<td>$20,000</td>
</tr>
<tr>
<td>0.01%</td>
<td>10</td>
<td>$2,000</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>ASIC defect level</th>
<th>Defective ASICs</th>
<th>Defective boards</th>
<th>Total repair cost at system level</th>
</tr>
</thead>
<tbody>
<tr>
<td>5%</td>
<td>5000</td>
<td>500</td>
<td>$5 million</td>
</tr>
<tr>
<td>1%</td>
<td>1000</td>
<td>100</td>
<td>$1 million</td>
</tr>
<tr>
<td>0.1%</td>
<td>100</td>
<td>10</td>
<td>$100,000</td>
</tr>
<tr>
<td>0.01%</td>
<td>10</td>
<td>1</td>
<td>$10,000</td>
</tr>
</tbody>
</table>
Boundary Scan Test

- Joint Test Action Group (JTAG) 2.0, or IEEE Standard 1149.1
  - boundary-Scan Test (BST) standard, using a 4/5-wire interface
  - for PCB and packaging testing
- Add a special logic cell to each ASIC I/O pad
  - these cells are joined together to form a chain and create a boundary-scan shift register

1. TDI (Test Data In)
2. TDO (Test Data Out)
3. TCK (Test Clock)
4. TMS (Test Mode Select)
5. TRST (Test Reset) optional.
BST Cells

- Two sequential elements
  - capture flip-flop and update flip-flop
  - reversible and can be used for both input and output
ASIC Faults

• Fabrication of ASIC may introduce a defect that in turn may cause a fault.

• Two common type of defects in metallization
  - Underetching the metal: bridge or short circuit
  - Overetching the metal: break or open circuits

• Other defects
  - Dicing, mounting, wafer probing, wire bonding
  - Electrical, thermal, corrosion, stress, adhesion failure, cracking, contaminated chemicals, dirty environment
Reliability

- **Infant Mortality**
  - if defects are nonfatal but to cause failures early in the life of a product.

- **Bathtub Curve**
  - failure rates decrease rapidly to a low value that remains steady until the end of life when failure rates increase again.

- **Wearout Mechanism**
  - hot-electron wearout, electromigration, etc.
Reliability

- **Burn-in Test**
  - catch susceptible early failure
  - operating an ASIC in an elevated temperature accelerates this type of failure, or apply additional stresses, such as elevated current or voltage

- **Metrics**
  - mean time between failures (MTBF) for a repairable produce
  - mean time to failure (MTTF) for a fatal failure
  - failure in time (FITs) when 1 fit equals a single failure in $10^{9}$ hours
  - sum the FITs for all the components in a product to determine an overall measure for the product reliability

- Microprocessor (standard part) 5 FITs
- 100 TTL parts, 50 parts at 10 FITs, 50 parts at 15 FITs
- 100 RAM chips, 6 FITs

The overall failure rate for this system is $5 + 50 \times 10 + 50 \times 15 + 100 \times 6 = 1855$ FITs.
Fault Models

• Open-Circuit Fault
  - bad contact

• Short-Circuit Fault
  - accidentally connected
  - also called bridging faults

• Degradation Fault
  - parametric fault: incorrect switching threshold
  - delay fault: a critical path being slower than specification

• Physical Fault vs. Logical Fault
## Fault Models

<table>
<thead>
<tr>
<th>Fault level</th>
<th>Physical fault</th>
<th>Degradation fault</th>
<th>Logical fault</th>
<th>Short-circuit fault</th>
</tr>
</thead>
<tbody>
<tr>
<td>Chip</td>
<td>Leakage or short between package leads</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Broken, misaligned, or poor wire bonding</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Surface contamination, moisture</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Metal migration, stress, peeling</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Metallization (open or short)</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td>Gate</td>
<td>Contact opens</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Gate to S/D junction short</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Field-oxide parasitic device</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Gate-oxide imperfection, spiking</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
<tr>
<td></td>
<td>Mask misalignment</td>
<td>.</td>
<td>.</td>
<td>.</td>
</tr>
</tbody>
</table>
Physical Faults

F1 is a short between m1 lines and connects node n1 to VSS.

F2 is an open on the poly layer and disconnects the gate of transistor t1 from the rest of the circuit.

F3 is an open on the poly layer and disconnects the gate of transistor t3 from the rest of the circuit.

F4 is a short on the poly layer and connects the gate of transistor t4 to the gate of transistor t5.

F5 is an open on m1 and disconnects node n4 from the output Z1.

F6 is a short on m1 and connects nodes p5 and p6.

F7 is a nonfatal defect that causes necking on m1.
Logical Faults

Stuck-at Fault model
1. Stuck at 1 fault (SA1 or s@1)
2. Stuck at 0 fault (SA0 or s@0)

F1 translates to node n1 being stuck at 0, equivalent to A1 being stuck at 1.
F2 result in node n1 remaining high, equivalent to A1 being stuck at 0.
F3 will affect half of the n-channel pull-down stack and may result in a degradation fault. The cell will still work, but the fall time at the output will double. A fault such as this is extremely hard to detect.
F4 is a bridging fault whose effect depends on the relative strength of the transistors driving this node.
F5 completely disables half of the n-channel pulldown stack and will result in a degradation fault.
F6 shorts the output node to VDD and is equivalent to Z1 stuck at 1.
F7 If this line did break due to electromigration the cell could no longer pull Z1 up to VDD. This would translate to a Z1 stuck at 0. This fault would probably be fatal and stop the ASIC working.
Stuck-at Fault Models

- **Stuck-at-0**
  - represent a signal that is permanently low regardless of the other signals that normally control the node

- **Stuck-at-1**
  - represent a signal that is permanently high regardless of the other signals that normally control the node

- **Example**
  - assume that you have a two-input AND gate that has a stuck-at-0 fault on the output pin
  - regardless of the logic level of the two inputs, the output is always 0
Stuck-at Fault Models

• Preconditions for Detecting
  - the node of a stuck-at fault must be controllable and observable for the fault to be detected

• Controllable Node
  - if you can drive it to a specified logic value by setting the primary inputs to specific values
  - a primary input is an input that can be directly controlled in the test environment

• Observable Node
  - if you can predict the response on it and propagate the fault effect to the primary outputs where you can measure the response
  - a primary output is an output that can be directly observed in the test environment
Automatic Test-Pattern Generation

- D(etect)-Calculus
- Enabling Value
  - propagate a signal
- Controlling Value
  - opposite of enabling value
  - fix the output
Automatic Test-Pattern Generation

• Find an input vector to test a fault origin
• Work backward until reach a PI (primary input)
• Work forward using sensitized path to a PO
  - a wave of D’s is called the D-frontier

1. Choose a fault

2. Work backward

3. (N)AND gates to 1, (N)OR gates to 0

4. Work backward
Design for Test (DFT)

- Insert scan chain in netlist
  - replacing flip-flops with multiplexed flip-flops

- Increase observability/controllability of non-sequential logic throughout the chip
Built-in Self-Test (BIST)

- **LFSR** (Linear feedback shift register)
  - Based on primitive polynomials
- **PRBS** (pseudorandom binary sequence)

Initial state is not all zeros
Maximal length sequence
Built-in Self-Test (BIST)

- Serial input signature register (SISR)
- Data compaction (compression)
- Signature (at the end of input sequence) analysis

If input sequence and SISR are long enough, it is unlikely (though possible) that two different input sequences will produce the same signature.
Built-in Self-Test (BIST)

Circuit diagram showing two Linear Feedback Shift Registers (LFSR1 and LFSR2) with inputs and outputs for testing. The diagram includes logic gates and state variables for testing the circuit under test (CUT) for stuck-at-faults.
Acknowledgement

Prof. Wei Tang
New Mexico State University
Questions?

Comments?

Discussion?