

#### D25-010

達成超快速且超節能並具備多種節電模式 之低電壓SRAM的多模式協作輔助電路

Coordinated Multi-Mode Assist Circuits to Achieve Ultra-High Operating-Speed and Energy-Efficiency for Low-Voltage SRAM Equipping Power-Saving Modes

隊伍名稱|超越良馬對 Beyond the Dialogue on Excellent Steeds

長 | 劉建彤 / 中正大學電機工程研究所



## 指導教授

#### 王進賢 中正大學電機工程學系

交通大學電子工程博士,現為中正大學電機工程學系特聘教授。曾任工業技術研究 院正工程師及經理、ISSCC ITPC委員。發表逾130篇論文,持有50餘項VLSI電路及架 構專利。

## 研究領域

低功耗與高速數位電路、SoC設計。



# 作品摘要

在南宋時期,抗金名將岳飛曾以「良馬對」來闡述戰馬的 選擇之道。這段典故強調:「良馬有兩類,一種擅長持 久奔馳,一種擅長迅速爆發,卻難以兼備兩者。」我們 的設計理念正是從這個思想出發,並進一步突破既有限 制。因此,我們的隊伍名稱「超越良馬對」,意味著我們 的目標不僅是遵循岳飛對良馬的標準,而是進一步超越 它,定義一個全新的「良馬之道」。在岳飛時代,人們只 關注速度與耐久力之間的取捨,卻未曾思考「如何在高 速運作的同時,達到極致的能效?」這正是我們的設計 理念與傳統觀念的差異。我們的設計不僅要「跑得快」, 還要「跑得遠」,更要「省草料」。

當今半導體靜態隨機存取記憶體(Static Random-Access Memory, SRAM)的設計,經常面臨速度、操作功耗、 待機功耗的三大挑戰:

- 1.若只追求高操作速度 → 會增加面積與操作功耗,影響 能效。
- 2.若只追求低操作功耗 → 則可能導致操作速度下降,影 響效能。
- 3.若只追求低待機功耗 → 則可能犧牲操作速度,影響整 體設計的優化。

為了解決這三大難題,本作品提出「良馬之道」: 嶄新的 低電壓SRAM設計策略。透過多模式協作輔助電路 (Coordinated Multi-Mode Assist Circuits),本設計可動 態調節SRAM在讀寫、待機、深度睡眠與關機模式下的輔 助電壓,實現同時兼具速度、能效、低耗電的低電壓 SRAM設計。本作品以22-nm 0.5-V 256-Kb SRAM作為技術 展示的載具,相比於同樣儲存量且為當今最好的0.5-V SRAM設計,達到以下指標:

- 1.工作頻率提升6.25倍
- 2.能耗降低91%

- 3.待機功耗節省83%
- 4.面積縮小73%
- 5.額外具備深度睡眠和關機模式

本設計不僅是現代物聯網場景中的「千里良馬」,更是一 匹融合速度、效能與節能的戰馬,為低電壓SRAM開創嶄 新格局。這正是我們「超越良馬對」所提出的良馬之道: 「快若疾風、靜若安眠、千里持久,冠絕天下!」

|                                                               |                                                |                       | "17<br>]  | [16]                            | [6]                                                   | TVLSU21<br>[7]       | TCAS1'18<br>[15]                                  | TCAS1'23<br>[18]                                                      | [17]                                    | TCAS1'17<br>[8]                                       | This work                                                            |     |
|---------------------------------------------------------------|------------------------------------------------|-----------------------|-----------|---------------------------------|-------------------------------------------------------|----------------------|---------------------------------------------------|-----------------------------------------------------------------------|-----------------------------------------|-------------------------------------------------------|----------------------------------------------------------------------|-----|
| Technology                                                    |                                                | 28-nm<br>Bulk<br>CMOS |           | 28-nm<br>Bulk<br>CMOS           | 28-nm<br>Bulk<br>CMOS                                 | 28-nm<br>FDSOI       | 28-nm<br>FDSOI                                    | 28-nm<br>FDSOI                                                        | 55-nm<br>Bulk<br>CMOS                   | 28-nm<br>Bulk CMOS                                    | 22-nm<br>Bulk CMOS                                                   |     |
| Capacity                                                      |                                                | 256 Kb                |           | 32 Kb                           | 32 Kb                                                 | 16 Kb                | 128 Kb                                            | 128 Kb                                                                | 8 Kb                                    | 256 Kb                                                | 256 Kb                                                               |     |
| Access bits                                                   |                                                | 64                    |           | 32                              | 8                                                     | 8                    | 64                                                | 32                                                                    | 16                                      | 64                                                    | 32                                                                   |     |
|                                                               | Type                                           | DSC-6T                |           | customized<br>6T                | untvoical<br>6T                                       | 9T                   | 7T                                                | 7T-NDR                                                                | 13T                                     | standard 6T                                           | CA-6T                                                                |     |
| Bit cell                                                      | Cell area<br>(normalized<br>to<br>standard-6T) | 1:                    | ×         | n.a.                            | 1.5×<br>(including<br>overhead<br>for VDD<br>control) | 1.65×                | 2.16x                                             | 2.95x                                                                 | n.a.                                    | 2.26×<br>(including<br>overhead for<br>local-sensing) | 1×                                                                   |     |
| Assist                                                        | Write                                          | CSVG                  | NBL       | WI.<br>boosting                 | 2 VDD<br>supplies<br>with<br>read/write<br>control    | Data-aware           | + hody<br>biasing*                                | FBB*<br>+ nevative<br>WWL*                                            | 3T write port                           | Vtrip-tracking                                        | NL-C                                                                 | VDI |
|                                                               | Read                                           |                       | -         | n.a.                            |                                                       | 3T read<br>port      | RWL<br>boosting<br>+ hody<br>biasing <sup>0</sup> | 2T report                                                             | 4T read<br>port                         | 4-bit Local<br>BL                                     | BLUI                                                                 | UD  |
|                                                               | Standby                                        | n.a.                  |           | n.a.                            | n.a.                                                  | n.a.                 | Tail buffer<br>+ hody<br>biasing <sup>6</sup>     | Dual<br>supply<br>voltage<br>+ NDR +<br>FBB <sup>0</sup>              | HVT<br>transistors<br>in 6T             | Threshold<br>power-gating                             | Peripheral: Supper<br>cut-off<br>Cell arrav:<br>NL-CVDD +<br>NR-CVSS |     |
|                                                               | Deep sleep                                     | n.a.                  |           | n.a.                            | n.a.                                                  | n.a.                 | n.a.                                              | Perinheral: Dual supply voltase + Power eatine Cell array; NDR + ZBB* | n.a.                                    | n.a.                                                  |                                                                      |     |
|                                                               | Shutdown                                       | n.a.                  |           | n.a.                            | n.a.                                                  | n.a.                 | n.a.                                              | n.a.                                                                  | n.a.                                    | n.a.                                                  |                                                                      |     |
| foux                                                          |                                                | n.a.                  |           | n.a.                            | 15 MHz <sup>+</sup><br>@<br>E <sub>min</sub> 0.41 V   | 7.5 MHz <sup>+</sup> | 15 MHz<br>@<br>E <sub>min</sub> 0.24 V            | 64 MHz                                                                | Read:<br>4.3 MHz<br>Write:<br>4.6 MHz   | 20 MHz (1)                                            | 125 MHz (6.2                                                         | 25) |
| Energy<br>access-bit                                          | 0.5 V or<br>@ E <sub>min</sub>                 |                       |           |                                 | 167.9 fJ<br>@ 0.41 V                                  | 840 fJ               | 36 fJ <sup>a</sup><br>@ 0.24 V                    | 24.5 fJ <sup>8</sup>                                                  | Read:<br>18 pJ<br>Write:<br>31 pJ       | 468 fJ (1)                                            | 81.9 fJ (0.18                                                        | 9)  |
| Energy<br>/total-bit                                          |                                                |                       |           |                                 | 41 aJ<br>@ 0.41 V                                     | 410 aJ               | 17.58 aJ <sup>a</sup><br>@ 0.24 V                 | 6 aJ <sup>a</sup> .                                                   | Read:<br>35.1 fJ<br>Write:<br>60.6 fJ   | 114 aJ (1)                                            | 10 aJ (0.09)                                                         | ,   |
| fmax                                                          |                                                |                       |           | 50 MHz<br>@V <sub>mh</sub> 0.7V |                                                       | 15 MHz*              |                                                   | 120 MHz <sup>*</sup>                                                  | Read:<br>21.9 MHz<br>Write:<br>24.5 MHz | 150 MHz (1)                                           | 305 MHz (2.0                                                         | (3) |
| Energy<br>access-bit                                          | 0.6 V or<br>@ V <sub>min</sub>                 | n.a. n.a.             |           | n.a.                            | 1.35 pJ <sup>+</sup>                                  | n.a.                 | 29.5 fJ <sup>-6</sup>                             | Read:<br>24.5 nJ<br>Write:<br>43.4 pJ                                 | 94.2 fJ (1)                             | 90.1 fJ (0.96)                                        |                                                                      |     |
| Energy<br>/total-bit                                          |                                                | n.a                   | n.a. n.a. |                                 |                                                       | 659 aJ+              |                                                   | 7 aJ**                                                                | Rend:<br>47.9 fJ<br>Write:<br>84.7 fJ   | 23 aJ (1)                                             | 11 aJ (0.48)                                                         |     |
| SB Power                                                      | 0.5 V                                          | n.a.                  |           |                                 |                                                       |                      | 2.5 pW <sup>+6c</sup>                             | 36 pW <sup>tk</sup>                                                   | 6.33 pW<br>@ 0.4 V                      | 229 pW(1)                                             | 39 pW (0.17                                                          | )   |
| /bit                                                          | 0.6 V<br>0.5 V                                 |                       |           |                                 |                                                       |                      | 5.25 pW <sup>+x</sup>                             | n.a.                                                                  |                                         | 343 pW(1)                                             | 62 pW (0.18                                                          | )   |
| OS Power 0.5 V<br>/bit 0.6 V<br>SD Power 0.5 V<br>/bit 0.6 V  |                                                | (not<br>available)    |           | n.a.                            | n.a.                                                  | n.a.                 | n.a.                                              | 0.04 pW <sup>n</sup><br>n.a.                                          | n.a.                                    | n.a.                                                  | 12 pW<br>17 pW<br>10 pW<br>14 pW                                     |     |
| Macro Area (mm²)<br>(Normalized*)<br>(percentages of assists) |                                                | n.:                   | n.a. n.a  |                                 | 0.028<br>(0.56)                                       | n.a.                 | 0.161 <sup>&amp;</sup><br>(0.81)                  | 0.075 <sup>&amp;</sup><br>(0.38)                                      | 0.041<br>(0.86)                         | 0.397<br>(1.00)                                       | 0.065 (0.27)<br>(CVSS/CVDD/BI<br>7.2966.7%/2.9                       | UE  |
| PPA:<br>(fJ-mm²/nm²/bit)                                      |                                                | n.:                   | a.        | n.a.                            | 183<br>@ 0.41 V                                       | n.a.                 | 56 <sup>th</sup><br>@ 0.24 V                      | 18 <sup>th</sup><br>@ 0.5 V                                           | Rend:<br>29781<br>Write:<br>51290       | 904<br>@ 0.5 V                                        | 42<br>@ 0.5 V                                                        |     |

表一比較表。

### Abstract

During the Southern Song Dynasty, the renowned general Yue Fei illustrated the philosophy of selecting warhorses through his famous "Dialogue on Excellent Steeds." This historical allusion emphasizes: "There are two types of excellent steeds: one excels in enduring long distances, while the other is superior in swift bursts of speed, but combining both traits is nearly impossible." Inspired by this philosophy, our design concept seeks to surpass traditional limitations. Consequently, our team's name, "Beyond the Dialogue on Excellent Steeds," represents our ambition to go beyond Yue Fei's original definition and establish a novel "Philosophy of the Excellent Steed." While ancient wisdom emphasized only the trade-off between speed and endurance, it never addressed the critical question: "How can we achieve both high-speed operation and ultimate energy efficiency simultaneously?" This distinction marks our innovative approach compared to conventional thinking. Our goal is not merely to be "fast" or "enduring," but also "highly energy-efficient."

With the continuous scaling of Integrated Circuit (IC) manufacturing processes, chip integration levels have significantly increased. As a result, reducing power consumption and improving energy efficiency have become critical challenges for IC designers. From a system perspective, completing tasks quickly and transitioning circuits into standby, deep sleep, or even power-off modes are essential power-saving strategies. This demands that all sub-circuits be equipped with appropriate operational modes to satisfy system-level requirements. Among all components within a system-on-chip (SoC), static random access memory (SRAM) consumes the most power. SRAM is a full-custom circuit and already difficult to design. When combined with the requirements for energy efficiency, power reduction, and support for multiple modes, it presents an even greater challenge for designers:

- 1. Solely pursuing high operational speed → increases chip area and power consumption, negatively impacting energy efficiency.
- 2. Solely targeting low operational power → reduces operating speed, degrading performance.
- 3. Solely focusing on low standby power → sacrifices operating speed, complicating overall design optimization.

To address these challenges, this work proposes the "Philosophy of the Excellent Steed": an innovative design strategy for low-voltage SRAM. Through the proposed coordinated multi-mode assist circuits, the assist voltages of SRAM are dynamically adjusted during read/write, standby, deep sleep, and shutdown modes, thereby achieving high speed, high energy efficiency, and low power consumption simultaneously. Implemented and demonstrated through a 22-nm 0.5-V 256-Kb SRAM test chip, our design significantly outperforms the current state-of-the-art 0.5-V SRAMs of equivalent storage capacity, achieving the following metrics:

- 1. 6.25x higher operating frequency
- 2.91% lower energy consumption
- 3. 83% standby power reduction
- 4. 73% smaller area
- 5. Additional deep sleep and shutdown modes supported

This design embodies a modern "excellent steed," effectively blending high speed, high energy efficiency, and low power consumption, thus pioneering a new paradigm for low-voltage SRAMs in IoT scenarios. This precisely defines the "Philosophy of the Excellent Steed" proposed by our "Beyond the Dialogue on Excellent Steeds" team: "Swift as the gale, quiet as deep sleep, enduring as a thousand miles, and unmatched across the world!"



Fig. 1 Circuit of one bank of

Fig. 2 Die photo of test chip.

26 2025 旺宏金砂獎 半導體設計與應用大賽 27