

# Low-Cost 3D Chip Stacking with ThruChip Wireless Connections

Dave.Ditzel@ThruChip.com Tadahiro.Kuroda@ThruChip.com

ThruChip Communications October 24, 2014

Stanford EE Computer Systems Colloquium

#### Credit to Professor Tadahiro Kuroda of Keio University



Tadahiro Kuroda

His research interests include ubiquitous electronics, sensor networks, wireless and wireline communications, and ultra-low-power CMOS circuits. He has published more than 200 technical publications, including 50 invited papers, and 18 books/chapters, and has filed more than 100 patents. **E-mail Address** Tadahiro.Kuroda@ThruChip.com Prof Kuroda leads one of the world's top circuit labs at Keio University.

Most of the ideas in this talk are from more than a decade of work investigating near-field inductive coupling for 3D stacking by professor Tadahiro Kuroda of Keio University and his students.

Kuroda founded ThruChip in 2008, and as ThruChip's CTO, is helping companies develop lower cost 3D chip stacking.

ThruChip provides design information and licensing of professor Kuroda's inventions.

#### Wireless 3D stacking

Current 3D stacking methods have challenges

- Main challenge is the high cost of Thru Silicon Vias
- Wireless is a better approach for stacking
  - Lower cost, lower power, higher bandwidth
  - Less costly if we can avoid having to add vertical wires
- Cost reduction possible, instead of increase, with:
  - Advances in wafer thinning
  - Wireless data communication between stacked die
  - Lower-cost power distribution from front to back of die

# Challenges with current 3D stacking

#### 3D Stacking with Wire Bonds



Staircase stacking constrains wire bond access to one side of each die.



#### Pros:

- Low Cost
- Good yield
- Allows ~50µ thin die
- Existing infrastructure

Cons:

- □ High wire bond inductance
- Higher power IO
  - Bandwidth limited to a few GHz
- Staircase stacking constraints
  - Limited number of bond wires
  - Underside clearance limits die thinness

# Wire bonding: Pretty example



#### Akita Elpida wire bond example of 20 stacked die(40u pitch)

# Wire bonding: Not so pretty



## 3D Stacking with Thru Silicon Vias (TSV)





Pros:

- ~10x lower power IO
- Thousands of IO possible

Cons:

- High Cost (1.4x 2x) over bare die
- Requires new CMOS process
- Yield reductions from bumps
- Area impact from TSV & KOZ
- Effects on nearby transistors

#### Proposal for lower cost 3D stacking

Separate Data Communication from Power Distribution

Data Communication: Use wireless near-field inductive coupling

- Uses simple CMOS digital circuits: No new semiconductor process expense
- Provides best in class inter-die power and bandwidth
- May <u>reduce chip cost</u> if IO area can be reduced
- Well understood technology validated with dozens of test chips
- Becomes more compelling as die get thinner

Power Distribution: Many options available when wireless used for data

- Wire bond Low cost, in high volume production
- TAB Low cost, in high volume production
- RDL/FOWLP Medium cost, production ready
- TSV High cost, early production
- Recommend <u>Highly Doped Silicon Vias</u> New <u>lowest cost</u> proposal, discussed later

### NAND goal is to go



#### DRAM goal is to go



# Relevant advances in wafer thinning

### Ultra-Thin 4µ wafer breakthrough

- $\Box$  Wafer thinning has been stuck at ~40 $\mu$  due to "Gettering problem"
  - Barrier was due in part to loss of the "gettering effect" at smaller dimensions when performing back grinding, causing impurities affecting device performance (particularly leakage) and yield.

#### DISCO Corporation solution can now thin to a few microns

- DISCO introduced a "Gettering Dry Polish" wheel which forms gettering sites while grinding, allowing thinning of wafer silicon to a few microns without device damage. [35]
- Example: DRAM silicon thinned to 4 microns
  - See "Ultra Thinning down to 4μm using 300-mm Wafer proven by 40-nm Node 2 Gb DRAM for 3D Multi-stack WOW Applications."[36] They concluded "No degradation in terms of retention characteristics and distribution employing 2 Gb DRAM wafer was found after ultra-thinning."





Ultra-thin wafers can be handled (from DISCO website)

October 24, 2014

# Wireless 3D data

## Wireless Near-Field Inductive Coupling

- Chip designers often spend a lot of time making sure they do not have too much coupling between adjacent wires.
- Idea: Turn that coupling into an advantage.
- Use Inductive Coupling for 3D wireless data communication
  - Inductive coils made with a few turns in standard metal layers
  - Coil diameter is about 3x the communication distance
  - Coils communicate vertically to adjacent chips by magnetic field
  - Receive and transmit coils can be placed concentrically on each die to form a transceiver
  - Multiple coils used to increased bandwidth
  - Bandwidth improves with Moore's law improvement in devices

### Communication is via magnetic field



Magnetic field can pass through silicon, including over active circuitry.

## ThruChip Interface (TCI)

- Simple transmitter and receiver circuits (basic form shown)
- Standard digital CMOS: Scales with Moore's Law
- Bandwidth: >40 Gigabits/second/coil with modern digital CMOS
- Delay: About 7 equivalent logic gates (NAND2 FO4)
- Energy: About 80 equivalent gates



#### TCI coil example



3 chips with staircase stacking

**TCI** Wireless Transceiver

## TCI bandwidth vs communication distance



# TCI scales with digital CMOS



- High BW: Data rate is equivalent to 1.5x of 5-stage ring oscillator
  - Fast: Delay is equivalent to 7x of 2NAND FO4
- Low Power: Energy is equivalent to 80x of 2NAND FO4
- Small: Circuit layout area is equivalent to 36x 2NAND

#### Energy per Bit becomes very compelling

#### Pin-to-Pin data transfer

| Node | TCI 2 Coils | TSV       | Wire bond |
|------|-------------|-----------|-----------|
| 32nm | 0.40 pJ/b   | 0.35 pJ/b | 3.45 pJ/b |
| 22nm | 0.20 pJ/b   | 0.30 pJ/b | 3.35 pJ/b |
| 16nm | 0.10 pJ/b   | 0.28 pJ/b | 3.30 pJ/b |
| 11nm | 0.05 pJ/b   | 0.26 pJ/b | 3.27 pJ/b |

TCI energy will be >65x lower than wire bond, >5x lower than TSV by 11nm.

#### Bus data transfer (8 memory chips + 1 SoC)

| Node | TCI 9 coils | TSV       | Wire bond  |
|------|-------------|-----------|------------|
| 32nm | 0.40 pJ/b   | 2.45 pJ/b | 24.15 pJ/b |
| 22nm | 0.20 pJ/b   | 2.10 pJ/b | 23.45 pJ/b |
| 16nm | 0.10 pJ/b   | 1.96 pJ/b | 23.10 pJ/b |
| 11nm | 0.05 pJ/b   | 1.82 pJ/b | 22.89 pJ/b |

TCI energy will be >450x lower than wire bond, >36x lower than TSV by 11nm.

#### Constant Magnetic Field Scaling.

Constant Electric Field Scaling for FET





 Constant Magnetic Field Scaling for TCI
 Diameter: 1/ζ Turn: ζ<sup>0.8</sup> Thickness: 1/ζ

| evaluation value    | dimension                             | scaling           |
|---------------------|---------------------------------------|-------------------|
| Device size         | [ <i>x</i> ]                          | $1/\alpha$        |
| Voltage             | [ <i>V</i> ]                          | $1/\alpha$        |
| Current             | [I]                                   | $1/\alpha$        |
| Capacitance         | [ <i>C</i> ]~[ <i>xx</i> / <i>x</i> ] | $1/\alpha$        |
| Delay time          | [t]~[CV/I]                            | $1/\alpha$        |
| Chip thickness      | [ <i>z</i> ]                          | $1/\zeta$         |
| Coil size           | [D]                                   | $1/\zeta$         |
| Coil turn number    | [ <i>n</i> ]                          | $\zeta^{0.8}$     |
| Inductance          | $[L] \sim [n^2 D^{1.6}]$              | 1                 |
| Magnetic coupling   | [k]~[z/D]                             | 1                 |
| Received signal     | $[v_{\rm R}] \sim [kL(I/t)]$          | 1                 |
| Data rate / channel | [1/t]                                 | α                 |
| Channel / area      | $[1/D^2]$                             | ζ <sup>2</sup>    |
| Data rate / area    | $[1/tD^{2}]$                          | $\alpha \zeta^2$  |
| Area / data rate    | $[tD^2]$                              | $1/\alpha\zeta^2$ |
| Energy / bit        | [IVt]                                 | $1/\alpha^3$      |

#### TCI broadcasting more efficient than TSV



TSV power and delay is increased in proportion to # of stacked chips.
 TCI transmitter consumes constant power and delay.

# Crosstalk decays rapidly



Received signal rapidly decays in the near field (at distance X > D/2). Crosstalk is sufficiently suppressed. Ref [07],[10],[11],[27]

#### Channel Pitch vs. Crosstalk



#### TCI Coils can be overlapped with QPDM



(a) Conventional TCI coil spacing



(b) Overlapping TCI coils



1 D coil spacing avoids crosstalk

Can pack coils 4x denser with QPDM

Receiver circuits disable out-of-phase channels to further improve noise immunity[37].

Area efficiency is improved by 4 times with overlapping coils

#### Demonstrated lowest die-to-die energy: 10 fJ/bit



"Dual coil TCI" Lowest Energy/bit 65nm CMOS

Reference: A 0.55v 10 fJ/bit Inductive coupling Data Link with Dual Coil Transmission Scheme, IEEE JSSC, April 2011.

## Compatible with Conventional EDA



## TCI has High Reliability

- Small Bit Error Rate < 10<sup>-14</sup> as reliable as wireline
- Small jitter < 5% UI</p>
- Small degradation
  by eddy current in substrate [05] SSDM'05
  by eddy current in power mesh [12] A-SSCC'07
  by eddy current in bit/word lines [26] ISSCC'10
  by chip misalignment [15] SSDM'08
- Small inter-channel crosstalk when pitch > 2\*diameter
- No Interference from digital to SRAM from environment (EMS) to environment (EMI)



[01] ISSCC'04







## Compatible with Conventional Testing



Although wide coil line/spacing and small transceiver circuits will have zero impact on yield, wafer-level testing is also possible. Ref [24]

## **TCI EM Compatibility**



EMI to RF:

Magnetic field generated by TCI is only 0.0001% of that by clock lines.

- EMS from RF: SNR is 200, good enough for a receiver with hysteresis comparator
- EMS from environment: yields small discrepancy in VDD<sub>min</sub>

## Misalignment Tolerant



TCI tolerates alignment error in chip stacking today.

TSV requires much fine alignment control as the size is 1/10.

Ref [15]

## TCI demonstrated with 28 test chips



# ThruChip introduces Highly Doped Silicon Vias (HDSV) for "Wireless" Power Delivery

#### HDSV: A new way to deliver power

- Ultra-thin wafers make inductive coupling for data very compelling
- Ultra-thin wafers are key to a novel mechanism for power delivery
- At <10 $\mu$  thickness can create power vias by highly doping the silicon
- With high levels of doping, silicon regions are conductive like metal
- Can pattern front-to-back conductive regions with an ion implant mask
- □ P+ and N+ doping increased by ~10-100x in desired regions
- Can be done with standard fab equipment
- Low cost step, less expensive than wire bonds

#### Let's look at an example of Highly Doped Silicon Vias (HDSV)

#### Highly Doped Silicon Vias for power distribution



Start with standard wafer Add implants to create highly doped regions for power vias Then add transistors and metal normally, metal caps on HDSV Thin silicon to ~4 microns

#### Highly Doped Silicon Vias for power distribution



A deeper than normal, and more highly doped well is used to make a low resistance HDSV pathway directly through the thinned wafer using the silicon itself.

The HDSV on one die and the electrodes on the next die are connected by pressure from a Room-Temperature Wafer Level Bonding machine (solid intermetallic bonding by diffusion) to create larger stacks.

October 24, 2014

#### Highly Doped Silicon Vias for power distribution



## TCAD modeling: HDSV resistance



Desire < 3 milliOhms front to back resistance for HDSV with 4μ wafer thickness</li>
 Front-to-back resistance can be made sufficiently low for power distribution

- Dose of 1x10<sup>16</sup> can be done on conventional implant equipment (about 10x normal)
- HDSV probably not usable for high speed data due to high capacitance, need TCI

#### **HDSV** Wireless Power Distribution

- □ No metallic TSV's, no wire bonds, no solder bumps
- □ Just stack chips and connect the stack to power
- Very loose alignment requirements on both data and power
- Data transmitted wirelessly with near field inductive coupling
- Power and ground go directly through the silicon, by using high levels of doping on ultra-thin die.
- Since silicon provides the power conduits instead of "metal wires", the power distribution is "wireless" ;-)
- HDSV should be low cost, extra implants are the only change to chips

# **Comparison Example**

# Stacked HBM DRAM TSV vs TCI/HDSV

# Example HBM DRAM with TSV



TSV's provide 8 channels of independent 128-bit I/O Total of 1024 TSV I/O at 1 Gbps for 128 GB/s

#### Replace TSV signals with TCI coils



TCI coil layout for two of eight DRAM-channels

Each TCI coil is  $100\mu x 100\mu$ 

Each TCI coil can run at 8 Gbps with slow DRAM transistors

26 coils/DRAM-channel provide the same bandwidth as HBM

- □ 16 coils for data x 8 Gbps/coil = 128 Gbps / DRAM-channel
- 8 coils for 64 address/control signals
- 2 coils for half of QPDM clocks (4 in a pair)

#### **Remove TSV section**



## Add TCI IO and shrink die area



### Define Power/Ground with HDSV



Vss



These are the mask patterns for low resistance implants for HDSV conduits from the front to back side of each die.

## Add HDSV for Vdd/Vss

6.91 mm Bank0 Bank0 64Mb 64Mb Half Half 13% area reduction Channel Left 1Gb Channel Right 1Gb is a significant TS. cost reduction. 4.443 mm TSV TSV TSV TS∖ DWORD1 DWORD2 DWORD3 DWORDO 32 1/0 321/0 321/0 321/0 

Original die size with TSV =  $35.241 \text{ mm}^2$ Die size with TCI & HDSV =  $30.701 \text{ mm}^2$ Area savings =  $4.540 \text{ mm}^2$ , -13%

#### Final stack DRAM example



Assumptions:

- Each die  $8\mu$  thick,  $4\mu$  silicon and  $4\mu$  metal stack.
- Data sent wirelessly with TCI inductive coupling links.
- Power passes through existing silicon with Highly Doped Silicon Vias.
- Base die can translate to standard IO or TCI link to a SoC.
- Smaller die size provides significant cost reduction.
- Cost of implants for HDSV and circuits for TCI relatively negligible.
- Seems likely this will result in a net cost reduction when using this stacking approach.
- No vertical metal wires! Wireless 3D stacking.

# Ultra-Thin Lowest Cost 3D Packaging



#### Panel-level stacking as batch (wafer scale) process

- 1) Known Good memory die (7.2mm x 7.2mm) placed face down on a support panel (465mm x 320mm) by the pitch of customer's chip size, mold is poured to the gap to form a memory panel by a memory vendor.
- 2) The memory panels are provided to an SoC vendor.
- 3) Known Good SoC die (8.3mm x 8.0mm) placed face down on a support panel (465mm x 320mm) by pitch of the SoC size (2240 chips in total), by the SoC vendor.
- 4) The SoC panel is then thinned from the back.
- 5) The memory panel is placed on top of the SoC panel, face down, bonded by RT pressure bonding machine.
- 6) The panel thinned from the back.
- 7) Repeat the process to build up memory 8-layer tower on the SoC panel.

## **Technical Summary**

- The synergy of ultra die thinning, TCI wireless data communication and Highly Doped Silicon Vias for power provides a future path for cost reduction using 3D stacking.
- Wireless TCI near-field inductive coupling has been well proven with 28 silicon test chips.
- Power distribution when using TCI can be done with proven techniques such as wire bond, TAB or even TSV.
- Power distribution for TCI with Highly Doped Silicon Vias is a new and still untested technique, which offers great promise for lowering 3D stacking costs. Help us make it happen.

#### References

[01] D. Mizoguchi, et al., "A 1.2Gb/s/pin Wireless Superconnect Based on Inductive Inter-chip Signaling (IIS)," ISSCC, pp.142-143, Feb. 2004. [02] N. Miura, et al., "Analysis and Design of Inductive Coupling and Transceiver Circuit for Inductive Inter-Chip Wireless Superconnect," Symp. VLSI Circuits, pp. 246-249, Jun. 2004. [03] N. Miura, et al., "Cross Talk Countermeasures in Inductive Inter-Chip Wireless Superconnect," CICC, pp.99-102, Oct. 2004. [04] N. Miura, et al., "A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip Wireless Superconnect with Transmit Power Control Scheme," ISSCC, pp.264-265, Feb. 2005. [05] D. Mizoguchi, et al., "Measurement of Inductive Coupling in Wireless Superconnect," SSDM, pp.670-671, Sep. 2005. [06] N. Miura, et al., "A 1Tb/s 3W Inductive-Coupling Transceiver for Inter-Chip Clock and Data Link," ISSCC, pp.424-425, Feb. 2006. [07] T. Kuroda, et al., "Perspective of Low-Power and High-Speed Wireless Inter-Chip Communications for SiP Integration," ESSCIRC, pp.3-6, Sep. 2006. [08] D. Mizoguchi, et al., "Constant Magnetic Field Scaling in Inductive-Coupling Data Link," SSDM, pp. 606–607, Sep. 2006. [09] N. Miura, et al., "A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with Digitally-Controlled Precise Pulse Shaping," ISSCC, pp.264-265, Feb. 2007. [10] T. Kuroda, "CMOS Proximity Wireless Communications for SiP Integration (Invited)," ISSCC, Feb. 2007. [11] T. Kuroda, "Low power technology for system LSI," J. IEICE, vol. 90, no. 11, pp. 977-981, Nov. 2007. [12] K. Niitsu, et al., "Interference from Power/Signal Lines and to SRAM Circuits in 65nm CMOS Inductive-Coupling Link," A-SSCC, pp.131-134, Nov. 2007. [13] N. Miura, et al., "An 11Gb/s Inductive-Coupling Link with Burst Transmission," ISSCC, pp.298-299, Feb. 2008. [14] D. Mizoguchi, et al., "Constant Magnetic Field Scaling in Inductive-Coupling Data Link," IEICE Trans. Electronics, Vol. E91-C, No. 2, pp. 200- 205, Feb. 2008. [15] K. Niitsu, et al., "Misalignment Tolerance in Inductive-Coupling Inter-Chip Link for 3D System Integration," SSDM, pp.86-87, Sep. 2008. [16] Y. Sugimori, et al., "A 2Gb/s 15pJ/b/chip Inductive-Coupling Programmable Bus for NAND Flash Memory Stacking," ISSCC, pp.244-245, Feb. 2009. [17] K. Niitsu, et al., "An Inductive-Coupling Link for 3D Integration of a 90nm CMOS Processor and a 65nm CMOS SRAM," ISSCC, pp.480-481, Feb. 2009. [18] K. Osada, et al., "3D System Integration of Processor and Multi-Stacked SRAMs by Using Inductive-Coupling Links," Symp on VLSI Circuits, pp. 256-257, Jun. 2009. [19] Y. Kohama, et al., "A Scalable 3D Processor by Homogeneous Chip Stacking with Inductive-Coupling Link," Symposium on VLSI Circuits, pp. 94-95, Jun. 2009. [20] S. Kawai, et al., "A 4.7Gb/s Inductive Coupling Interposer with Dual Mode Modem," Symposium on VLSI Circuits, pp. 92-93, Jun. 2009. [21] M. Saito, et al., "47% Power Reduction and 91% Area Reduction in Inductive-Coupling Programmable Bus for NAND Flash Memory Stacking," CICC, pp. 449-452, Sep. 2009. [22] K. Kasuga, et al., "Electromagnetic Interference and Susceptibility in Inductive-Coupling Link," SSDM, pp.62-63, Nov. 2009. [23] M. Saito, et al., "An Extended XY Coil for Noise Reduction in Inductive-coupling Link," A-SSCC, pp.305-308, Nov. 2009. [24] K. Kasuga, et al., "A Wafer Test Method of Inductive-Coupling Link," A-SSCC, pp.301-304, Nov. 2009. [25] N. Miura, et al., "An 8Tb/s 1pJ/b 0.8mm2/Tb/s QDR Inductive-Coupling Interface Between 65nm CMOS and 0.1um DRAM," ISSCC, pp.436-437, Feb. 2010. [26] M. Saito, et al., "A 2Gb/s 1.8pJ/b/chip Inductive-Coupling Through-Chip Bus for 128-Die NAND-Flash Memory Stacking," ISSCC, pp.440-441, Feb. 2010. [27] T. Kuroda, "Inductively Coupled ThruChip Interface," ISSCC, ES3(Energy-Efficient High-Speed Interfaces), Feb. 2010. [28] N. Miura, et al., "A 0.7V 20fJ/bit Inductive-Coupling Data Link with Dual-Coil Transmission Scheme," Symposium on VLSI Circuits, pp. 201-202, June 2010. [29] T. Kuroda, et al., "ThruChip Interface (TCI) for 3D Integration of Low-Power System (Invited)," IEDM, p.17.1.1, Dec. 2010. [30] N. Miura, et al., "A 2.7Gb/s/mm<sup>2</sup> 0.9pJ/b/Chip 1Coil/Channel ThruChip Interface for NAND Flash Memory Stacking," ISSCC, pp.490-491, Feb. 2011. [31] Y. Shimazaki, et al., "A 5Gbps/ch ThruChip Interface and Autom. P&R Design Methodology for 3-D Integration of 45nm CMOS Processors," COOL Chips XV, pp.1-3, Apr. 2012. [32] Y. Koizumi, et al., "Dynamic power control with a heterogeneous multi-core system using a 3-D wireless inductive coupling interconnect," ICFPT'12, pp. 293-296, Dec. 2012. [33] H. Matsutani, et al., "A Case for Wireless 3D NoCs for CMPs," ASP-DAC'13, pp. 23-28, Jan. 2013. [34] Y. Take, et al., "3D Clock Distribution Using Vertically/Horizontally Coupled Resonators," ISSCC, pp. 258-259, Feb. 2013. [35] "Introduction of Gettering DP Wheel", DISCO Website, in both English and Japanese, http://www.disco.co.jp/jp/solution/apexp/polisher/gettering.html [36] Y.S. Kim, et al., "Ultra Thinning down to 4mm using 300-mm Wafer proven by 40-nm Node 2 Gb DRAM for 3D Multi-stack WOW Applications", Symp. VLSI Circuits, pp. 22-23, June 2014. [37] A.R. Junaidi, Y. Take, T. Kuroda, "A 352 Gb/s Inductive-Coupling DRAM/SoC Interfaces Using Overlapping Coils with Phase Division Multiplexing and Ultra-Thin Fan-Out Wafer Level Package", Symp. VLSI Circuits, June 2014.

[38] Y. Take, N. Miura, T. Kuroda, "A 30 Gb/s/Link 2.2 Tb/s/mm2 Inductively-Coupled Injection-Locking CDR for High-Speed DRAM Interface", JSSC, pp 2552-2559, November 2011.

[39] N. Miura, e al., A 0.55V 10fJ/bit Inductive-Coupling Data Link and 0.7V 135fJ/Cycle Clock Link with Dual-Coil Transmission Scheme", IEEE JSSC, pp. 965-973, April 2011.