# Standard Cell Layout Synthesis for Dual-Sided 3D-Stacked Transistors

Kairong Guo<sup>1</sup>, Haoran Lu<sup>1</sup>, Rui Guo<sup>1</sup>, Jiarui Wang<sup>1,2</sup>, Chunyuan Zhao<sup>1</sup>

Heng Wu<sup>1</sup>, Runsheng Wang<sup>1,3,4</sup>, Yibo Lin<sup>1,3,4\*</sup>

<sup>1</sup>School of Integrated Circuits, Peking University, Beijing

<sup>2</sup>School of Computer Science, Peking University, Beijing

<sup>3</sup>Institute of Electronic Design Automation, Peking University, Wuxi

<sup>4</sup>Beijing Advanced Innovation Center for Integrated Circuits

{krguo25,njlhr,rguo24,jiaruiwang,zhaochunyuan}@stu.pku.edu.cn, {hengwu,r.wang,yibolin}@pku.edu.cn

Abstract—As transistor scaling approaches physical limits, the dual-sided 3D-stacked transistor emerges as a promising architecture, featuring back-to-back-stacked N/P transistors and dual-sided interconnects. This unique structure demands novel design solutions, including drain/gate merge for dual-side connectivity and flexible frontside/backside I/O pin assignment.

In this paper, we propose a standard cell synthesis framework for dual-sided 3D-stacked transistors comprising SMT-based merge-aware placement that ensures dual-side connectivity via dynamic field drain merge insertion, and SAT-based dual-side routing supporting automated or specified I/O pin assignment.

Experimental results show that our flow achieves, on average, 4% reduction in cell area, 4% in via usage, and 7% in M0 metal usage compared to previous 3.5T designs, while efficiently generating all  $2^n$  pin assignment variants for each cell. The support for multi-row placement and FDM insertion in our flow allows it to identify layouts surpassing manual designs, such as an AOI22xp5 variant with 6.3% better performance and 4.3% lower power than manual designs. At the chip level, our generated library with all  $2^n$  pin assignment variants can further reduce wirelength by 10% and eliminate DS-nets. These show the effectiveness and flexibility of our framework for advanced dual-sided 3D-stacked transistor cell design.

Index Terms—standard cell, layout synthesis, transistor-level placement and routing

#### I. INTRODUCTION

As transistor scaling approaches fundamental physical limits, 3D-stacked transistors have emerged as a critical path for continued density and performance gains. The Complementary FET (CFET) technology [1] [2] [3], in which N-FETs are stacked on P-FETs or vice versa with both transistors on the frontside, has emerged as a leading solution for ultra-scaled 3D transistor stacking. With backside power delivery, CFET can achieve cell heights down to 3T [4] [5].

Recently, the dual-sided 3D-stacked transistor technology has gained attention for its unique back-to-back-stacked N/P transistors and dual-sided interconnects [6] [7] [8]. Compared to CFET, this architecture achieves further area scaling, supporting ultra-compact cell heights down to 2.5T. [7] and [8] demonstrated that it outperforms CFET at the block level, primarily due to dual-sided interconnect flexibility. By optimizing cell pin assignments (i.e., placing pins on the frontside

or backside), the number of dual-sided nets (DS-nets) and total wirelength can be significantly reduced.

However, this benefit introduces new cell-level challenges: Standard cells for dual-sided 3D-stacked transistors now require exponentially more pin assignment variants to fully utilize dual-sided connectivity. For example, even a simple AOI22 cell with 4 input pins needs  $2^4=16$  variants (since output pins typically need to be accessible from both sides [7]), resulting in substantial manual design overhead for standard cells. This makes automated standard cell layout synthesis particularly crucial for dual-sided stacked transistors, as manual design cannot efficiently explore the vast design space of dual-sided configurations while ensuring design rule compliance and optimal performance.

Handling dual-sided interconnects is the fundamental challenge for standard cell synthesis of dual-sided 3D-stacked transistors. The dual-sided 3D-stacked transistor technology relies on drain merge and gate merge structures to achieve dual-sided signal connectivity. These structures require that the source, drain or gate terminals of frontside and backside transistors should be precisely aligned, which should be considered during transistor placement. When such alignment is not possible, field drain merge (FDM) will be inserted [9] to preserve connectivity, leading to increased design complexity and area overhead. Additionally, during dual-sided routing, the connectivity of these merge structures must be properly managed.

Research on standard cell synthesis began early [10] [11], and recent studies have predominantly concentrated on planar CMOS [12] and FinFET technologies [13] [14] [15] [16]. However, there has been a growing interest in 3D-stacked standard cell synthesis, particularly in CFET architectures. [17] [18] [19] pioneer a Satisfiability Modulo theory (SMT) based approach that co-optimizes transistor-level placement and intra-cell routing. [20] proposed a search-tree based transistor placement framework that simultaneously handles transistor folding and placement, incorporating effective intracell routability pruning for CFET. However, research on dual-sided 3D-stacked transistors standard cell synthesis is limited, with only [8] addressing this area. Their placement approach focuses on minimizing area and maximizing merge structures,



Fig. 1: Illustration of the cell architecture and design rules for the dual-sided 3D-stacked transistor. Taking 2.5T double-row DHL as an example.

but cannot guarantee dual-sided net routability, requiring additional constraints to ensure connectivity.

To tackle the above challenges, in this work, we propose a standard cell layout synthesis flow for dual-sided 3D stacked transistors. The standard cell architecture and design rules are based on the designs in [9]. The key contributions are summarized as follows:

- We construct an automated framework for dual-sided 3D-stacked transistor standard cell synthesis, enabling single/multi-row designs and flexible frontside/backside I/O pin assignments.
- We propose an SMT-based merge-aware transistor placement method, supporting multi-row placement and FDM insertion, guaranteeing dual-sided net routability.
- We advance existing SAT-based intra-cell routing formulations to dual-sided architectures, supporting merge structures and automated pin assignment exploration.
- Experimental results show our flow achieves on average 4% reduction in cell area, 4% in via usage, and 7% in M0 metal usage compared to previous 3.5T designs [8], and enables efficient design space exploration, with the best AOI22xp5 layout improving performance by 6.3% and power by 4.3% over manual designs. At the chip level, our generated library with all 2<sup>n</sup> pin assignment variants can further reduce wirelength by 10% and eliminates DSnets.

The rest of the paper is organized as follows, Section II introduces the cell architecture for the dual-sided 3D-stacked transistor; Section III explains the details of the proposed flow; Section IV validates the flow with experimental results; Section V concludes the paper.

# II. PRELIMINARY

A. Dual-Sided 3D-Stacked Transistor Cell Architecture and Design Rules

The standard cell architecture and key design rules for dual-sided 3D-stacked transistors in this work are adapted from [9]. Detailed illustrations are provided in Figure 1. All transistors feature a single-fin (1-fin) configuration, where



Fig. 2: Illustration of field drain merge (FDM). FDM will be inserted to preserve connectivity when drain or gate alignments are not well-planned, leading to increased design complexity and area overhead. SDB denotes single diffusion break.



Fig. 3: An AOI22xp5 placement with minimum area where all gates generate merge structures, but the DS-net Y cannot be routed.

cells requiring higher drive strength are implemented through parallel transistor connections. Single Diffusion Break (SDB) is employed. The M0 layer utilizes metal cuts with a minimum cut spacing of 1.5 CPP (i.e. MAR=2, EOL=1 as defined in [21]). Via spacing is set to exceed both 0.5 CPP and the M0 pitch (VR=1, as in [21]), which prohibits M1 usage in single-

TABLE I: Notations for dual-sided 3D-stacked transistors cell synthesis flow.

| Term               | Description                                                |
|--------------------|------------------------------------------------------------|
| N, P               | Set of N-FETs/P-FETs.                                      |
| $t_{n/p}, s_{n/p}$ | N-FET/P-FET $t, s$ .                                       |
| $x_t$              | X coordinate of lower-left corner of FET t.                |
| $r_t$              | Row number of FET $t$ .                                    |
| $n_{D/S/G}(t)$     | Net information of FET $t$ 's drain, source and gate.      |
| $n_{L/R}(t)$       | Net information of FET t's left/right pin.                 |
| $w_i^{f/b}$        | Width of $i^{th}$ row on frontside/backside.               |
| G(V,E)             | 3D routing graph $G$                                       |
| $v_{x,y,l}, v^s$   | Vertex with the coordinate $(x, y, l)$ , and super vertex. |
| a(v)               | Set of adjacent vertices of $v$ in $G$ .                   |
| $e_{v,u}$          | An edge from $v$ to $u$ in $G$ .                           |
| $w_{v,u}$          | Weighted cost for metal segment on $e_{v,u}$ .             |
| n, m               | Multi-pin net $n$ , and $m^{th}$ sink for $n$ .            |
| $f_m^n(v,u)$       | 0-1 indicator if $e_{v,u}$ is used for commodity $f_m^n$ . |
| $m_{v,u}$          | 0-1 indicator if there is a metal segment on $e_{v,u}$ .   |

row 2.5T standard cells. For multi-row designs, vertical interrow connections can be facilitated through either source/drain metal (MD) or gate inter-row connections. Notably, no relative positioning constraints are imposed between drain merge and gate merge structures. Subsequently, the N-FETs are by default placed on the frontside.

# B. Merge Structures and DS-Net Routability

This technology relies on drain merge and gate merge structures to achieve dual-sided signal connectivity. When the source, drain, or gate terminals of transistors connected to a DS-net are not precisely aligned, field drain merge (FDM) insertion is required, as shown in Figure 2. FDM is a special connectivity structure where active fins are removed to allow a tall via bridging M0 layers from frontside to backside. However, minimizing area and maximizing merge structures does not guarantee DS-net routability in all cases, as shown in Figure 3, requiring additional constraints to ensure complete connectivity.

# III. STANDARD CELL LAYOUT SYNTHESIS FLOW FOR DUAL-SIDED 3D-STACKED TRANSISTORS

# A. Overview of Proposed Flow

Figure 4 illustrates the proposed standard cell layout synthesis flow for dual-sided 3D-stacked transistors. The synthesis process comprises SMT-based merge-aware dual-sided transistor placement and SAT-based dual-sided intra-cell routing. The merge-aware dual-sided transistor placement ensures the routability of dual-sided nets by enforcing the at-least-one merge constraint and implementing dynamic field drain merge Insertion. The SAT formulation for dual-sided intra-cell routing, extended from [21] [22], supports merge structures and automated pin assignment exploration.

The notations used in the following section are provided in Table I.

#### B. Merge-Aware Dual-Sided Transistor Placement

The dual-sided transistor placement extends the Relative Positioning Constraint (RPC) in [23] from single-stack placement



Fig. 4: Proposed standard cell layout synthesis flow for dualsided 3D-stacked transistors.

to multi-row placement:

$$\bigwedge_{t_n, s_n \in N, t_n \neq s_n} [r_{t_n} = r_{s_n} \land RPC(t_n, s_n)] \lor (r_{t_n} \neq r_{s_n}) \quad (1)$$

An analogous constraint is applied to P-FETs. SDB is adopted in RPC. The origin coordinates for both the frontside and backside are identical.

The objective is to lexicographically optimize the overall cell width (CW) and weighted half-perimeter wirelength (HPWL):

$$\min_{x_t, r_t, t \in N \cup P} \begin{cases}
CW = \max_{0 \le i \le R} \{\max(w_i^f, w_i^b)\}, \\
\sum_{n} \text{HPWL}_{front}(n) + \text{HPWL}_{back}(n)
\end{cases} (2)$$

where  $w_i^f = \max_{t \in N, r_t = i} (x_t + w_t)$ , R represents the maximum number of rows.

To ensure the routability of dual-sided nets, we introduce the **at-least-one merge constraint** and implement **dynamic field drain merge insertion**. Additionally, to prune the search space and accelerate the search process, we also introduce the **adjustable split-gate constraint**.

1) At-least-one merge constraint: For each dual-sided net n in the netlist, define merge(n) as follows:

$$merge(n) = \bigvee_{t \in N, s \in P} [x_t = x_s \land r_t = r_s$$

$$\land (\lor_{i \in \{L, R, G\}} n_i(t) = n \land n_i(s) = n)]$$
(3)

Here, merge(n)=1 indicates that in the current transistor placement, there exists at-least-one pair of N-FET and P-FET, t and s, that forms a merge structure for net n, enabling dual-sided connectivity. Thus, we can impose the following constraint:

$$\bigwedge_{\text{DS-net } n} merge(n) = 1$$
(4)

which ensures the routability of all dual-sided nets.

2) Dynamic field drain merge insertion: The above-mentioned constraint straightforwardly ensures the lower bound of routability for dual-sided nets, but it does not account for the insertion of FDM structures. As illustrated in Figure 4, we introduced dynamic field drain merge insertion to expand the solution space to include those with FDM structures. Specifically, instead of directly constraining merge(n), we incorporate it as a penalty into CW:

$$w_i^f = \max_{t \in N, r_t = i} (x_t + w_t) + \sum_n \mathbf{I}_{i,n} [1 - merge(n)] w_{FDM}$$
 (5)

where  $w_{FDM}$  stands for the width of FDM. The indicator variable  $\mathbf{I}_{i,n}(\in\{0,1\})$  determines FDM insertion rows through a greedy approach by selecting the row (both frontside and backside) with the highest occurrence count of dual-sided net n. The adoption of SDB ensures this direct summation approach would not make the penalized CW fall below the actual cell width with FDMs. This method enables automatic trade-off between area cost from drain/gate alignments and that from FDM insertion, allowing the placement algorithm to flexibly choose the more area-efficient option depending on the context.

# Algorithm 1 Dynamic Field Drain Merge Insertion

```
1: Input: FETs N,P
2: solution S \leftarrow perform initial transistor placement
       \bigwedge_{DS\text{-net }n} merge(n)|_S = 1 then
3:
4:
 5: else
         for each DS-net n do
6:
             if merge(n)|_S = 0 then
 7:
                  N \leftarrow N \cup \{t\}, n_D(t) = n, n_{S,G}(t) = \text{null}
8:
                  P \leftarrow P \cup \{s\}, n_D(s) = n, n_{S,G}(s) = \text{null}
 9:
                  add constraint x_t = x_s \wedge r_t = r_s
10:
             end if
11:
12:
         end for
         solution S' \leftarrow \text{perform refined placement}
13:
         return S'
14:
15: end if
```

Notably, the resulting transistor placement does not yet contain FDMs. We still need to perform a placement refinement to determine FDM locations. The complete workflow is detailed in Algorithm 1. The algorithm starts by performing an initial transistor placement. This placement will determine whether FDM insertion is necessary. If FDM insertion is required, the algorithm selects the corresponding dual-sided nets (based on the  $merge(n)|_S$ ) and inserts FDM in the form of aligned N-FET and P-FET pair (t,s). These FDMs are then added to the transistor set. Then the algorithm performs a refined placement and output the solution with FDMs. null indicates eligibility for diffusion sharing with any other net. This approach allows FDMs to directly apply (1).

# C. Dual-Sided Intra-cell Routing

1) SAT-formulation for Dual-Sided Intra-cell Routing: Building upon [21] [22], we extend SAT-based intra-cell



Fig. 5: Pin allocation for dual-sided routing. Candidate vertices to a given pin are connected to a super-vertex.



Fig. 6: M0 and M1 layer vertices connect to three supervertices:  $v_F^s$  (aggregating all frontside vertices),  $v_B^s$  (collecting all backside vertices), and  $v_F^sB$  (connecting all dual-sided vertices). All three super-vertices are constrained by (6).

routing to dual-sided architectures, where the SAT formulation is derived from multi-commodity network flow theory and design rules. After transistor placement, the candidate connection vertices for each net's pins are determined by local connections through merge structures and inter-row gate/MDs. All candidate vertices to a given pin are connected to a super-vertex, as shown in Figure 5. We treat these supervertices as sources and sinks for their corresponding nets, enforcing connectivity constraints on them through commodity flow conservation (CFC) at both per-net and per-commodity granularity.

For a given super-vertex  $v^s$  corresponding to net n and commodity m, the following holds:

$$\sum_{u \in a(v^s)} f_m^n(v^s, u) = 1 \land \bigwedge_{(i,j) \neq (n,m)} \sum_{u \in a(v^s)} f_j^i(v^s, u) = 0$$
 (6)

While this formulation is valid for  $v_{A2}^s$  and  $v_{B1}^s$  in Figure 5, it is wrong for  $v_{A1}^s$  and  $v_{B2}^s$ , because it fails to account for merge structure connectivity when other commodities  $m' \neq m$  of net n require merge-based connections. Therefore, for supervertices like  $v_{A1}^s$  and  $v_{B2}^s$  the constraint should be modified as:

$$\sum_{u \in a(v^s)} f_m^n(v^s, u) = 1 \wedge \bigwedge_{i \neq n, j} \sum_{u \in a(v^s)} f_j^i(v^s, u) = 0$$

$$\wedge \left[ \bigwedge_{j \neq m} \sum_{u \in a(v^s)} f_j^n(v^s, u) = 0 \vee \sum_{u \in a(v^s)} f_j^n(v^s, u) = 2 \right]$$
(7)

Design rule constraints include end-of-line spacing rule (EOL), minimum area rule (MAR), and via spacing rule (VR),

with specific values provided in Section II-A and Figure 1. The optimization objective is to minimize total metal length:

$$\min \sum_{v,u \in V, v \neq u} w_{v,u} \cdot m_{v,u}. \tag{8}$$

2) I/O pin assignment: Dual-sided 3D-stacked transistor standard cells feature three I/O pin types: frontside, back-side, and dual-sided. Figure 6 shows their implementation through three super-vertices:  $v_F^s, v_B^s, v_{FB}^s$ . For different pin assignments, these super-vertices serve as sources/sinks for corresponding nets, with additional commodities introduced. Table II details the correspondence between pin assignments and super-vertex usage.

TABLE II: Correspondence between pin assignments and super-vertex usage.

| Pin assignment | Super-vertex        |
|----------------|---------------------|
| frontside      | $v_F^s$             |
| backside       | $v_B^s$             |
| dual-sided     | $v_F^s$ and $v_B^s$ |
| unspecified    | $v_{FB}^{s}$        |

IV. EXPERIMENTAL RESULTS

Our framework is implemented in C++ and executed on an AMD EPYC 9654 workstation using a single-threaded Z3 [24] solver (version 4.8.5) for both SAT and SMT solving. The input standard cell netlists are modified from ASAP7 [25] SPICE netlists to generate dual-sided 3D-stacked transistor standard cell layouts. Detailed cell architecture and design rules are provided in Section II-A and Figure 1. We decompose the multi-fin FETs in ASAP7 standard cells (originally implemented with multiple fingers for higher drive strength) into parallel-connected FETs, each with  $\leq$  2 fins. In subsequent tables, the reported FET count (#FET) enumerates each parallel-connected FET individually.

#### A. Cell Quality Comparison

We contacted the authors of [8] and obtained the layouts of single-row 3.5T cells.<sup>1</sup> Table III presents a cell metrics comparison of **single-row 3.5T** standard cell layouts generated by our flow and by [8], under the same design rules (MAR/EOL/VR=1/2/1). Both approaches complete the synthesis without using M1 in the single-row case. For all cells in the table, our method successfully generates all  $2^n$  possible variants, with an average generation time of 27.1 seconds per cell. The reported area, via usage, and M0 metal usage are averaged over all variants, and our approach achieves reductions of 4%, 4%, and 7% respectively compared to [8].

# B. Standard Cell Design Exploration

Thanks to the consideration of multi-row placement and the insertion of FDM structures, our flow offers greater flexibility for design exploration.

In [8], only single-row 2.5T generation was explored, and results for relatively complex cells such as DFF, DHL, and

TABLE III: Comparison of **single-row 3.5T** dual-sided 3D-stacked transistor standard cell layouts between [8] and our work. Both use the same design rules: MAR/EOL/VR=1/2/1. #CPP: Cell width, the distance from the left SDB to the right SDB (in CPP units). #Via: Number of vias. M0/M1: Length of used M0/M1 metal, measured in CPP/row-height units.

|           |      | 101   |       |       |      |       |       |       |      |
|-----------|------|-------|-------|-------|------|-------|-------|-------|------|
| Cell Name | #FET | [8]   |       |       | Ours |       |       |       |      |
|           |      | #CPP  | Via   | M0    | M1   | #CPP  | Via   | M0    | M1   |
| AND2x2    | 8    | 5.00  | 14.00 | 11.00 | 0.00 | 5.00  | 14.00 | 10.00 | 0.00 |
| AND3x1    | 8    | 5.00  | 11.00 | 10.50 | 0.00 | 5.00  | 11.00 | 9.00  | 0.00 |
| AND3x2    | 10   | 6.00  | 14.00 | 11.50 | 0.00 | 6.00  | 14.00 | 10.00 | 0.00 |
| AOI21x1   | 12   | 7.00  | 20.00 | 19.33 | 0.00 | 7.00  | 20.38 | 20.31 | 0.00 |
| AOI22x1   | 16   | 10.00 | 28.00 | 28.00 | 0.00 | 10.00 | 28.06 | 30.70 | 0.00 |
| BUFx2     | 6    | 4.00  | 10.00 | 6.83  | 0.00 | 4.00  | 10.00 | 6.00  | 0.00 |
| BUFx3     | 8    | 5.00  | 13.00 | 9.83  | 0.00 | 5.00  | 12.00 | 8.50  | 0.00 |
| BUFx4     | 10   | 6.00  | 16.00 | 10.83 | 0.00 | 6.00  | 15.00 | 9.50  | 0.00 |
| BUFx8     | 20   | 11.00 | 31.00 | 22.83 | 0.00 | 11.00 | 28.00 | 17.50 | 0.00 |
| DFFHQNx1  | 24   | 14.00 | 34.00 | 51.00 | 0.00 | 14.00 | 34.25 | 49.50 | 0.00 |
| DFFHQNx2  | 26   | 15.00 | 37.00 | 51.00 | 0.00 | 15.00 | 37.25 | 48.63 | 0.00 |
| DFFHQNx3  | 28   | 18.00 | 47.00 | 58.00 | 0.00 | 16.00 | 39.25 | 53.38 | 0.00 |
| DHLx1     | 16   | 11.00 | 27.00 | 33.17 | 0.00 | 10.00 | 24.00 | 28.50 | 0.00 |
| DHLx2     | 18   | 12.00 | 30.00 | 30.17 | 0.00 | 11.00 | 27.00 | 27.50 | 0.00 |
| DHLx3     | 20   | 13.00 | 33.00 | 37.17 | 0.00 | 12.00 | 29.00 | 32.00 | 0.00 |
| INVx1     | 2    | 2.00  | 5.00  | 4.00  | 0.00 | 2.00  | 5.00  | 3.00  | 0.00 |
| INVx2     | 4    | 3.00  | 8.00  | 4.00  | 0.00 | 3.00  | 8.00  | 3.50  | 0.00 |
| INVx4     | 8    | 5.00  | 14.00 | 8.00  | 0.00 | 5.00  | 13.00 | 7.00  | 0.00 |
| INVx8     | 16   | 9.00  | 26.00 | 20.00 | 0.00 | 9.00  | 23.00 | 14.50 | 0.00 |
| NAND2x1   | 6    | 6.00  | 14.00 | 11.67 | 0.00 | 5.00  | 14.00 | 10.25 | 0.00 |
| NAND2x2   | 12   | 9.00  | 24.00 | 21.00 | 0.00 | 9.00  | 25.00 | 22.50 | 0.00 |
| NAND3x1   | 12   | 10.00 | 25.00 | 24.33 | 0.00 | 10.00 | 26.50 | 26.00 | 0.00 |
| NOR2x1    | 6    | 5.00  | 13.00 | 9.67  | 0.00 | 5.00  | 14.00 | 10.25 | 0.00 |
| NOR2x2    | 12   | 9.00  | 24.00 | 25.67 | 0.00 | 9.00  | 24.00 | 22.50 | 0.00 |
| NOR3x1    | 12   | 10.00 | 24.00 | 24.33 | 0.00 | 10.00 | 26.63 | 26.06 | 0.00 |
| OAI21x1   | 12   | 8.00  | 22.00 | 20.33 | 0.00 | 8.00  | 22.00 | 22.38 | 0.00 |
| OAI22x1   | 16   | 10.00 | 28.00 | 28.00 | 0.00 | 10.00 | 28.13 | 29.70 | 0.00 |
| OR2x2     | 8    | 5.00  | 14.00 | 11.00 | 0.00 | 5.00  | 14.00 | 10.00 | 0.00 |
| OR3x1     | 8    | 5.00  | 11.00 | 10.50 | 0.00 | 5.00  | 11.00 | 9.00  | 0.00 |
| OR3x2     | 10   | 6.00  | 14.00 | 11.50 | 0.00 | 6.00  | 14.00 | 10.00 | 0.00 |
| XNOR2x1   | 16   | 11.00 | 29.00 | 35.67 | 0.00 | 9.00  | 24.25 | 28.87 | 0.00 |
| XOR2x1    | 16   | 11.00 | 30.00 | 37.67 | 0.00 | 9.00  | 26.00 | 32.50 | 0.00 |
| Avg.      |      | 8.31  | 21.56 | 21.83 | -    | 8.00  | 20.68 | 20.28 | -    |
| Norm.     |      | 1.00  | 1.00  | 1.00  | -    | 0.96  | 0.96  | 0.93  | -    |

TABLE IV: Synthesis results of **2.5T** dual-sided 3D-stacked transistor standard cell layouts in single-row and double-row. 'tot.' denotes the total number of possible variants, and 'suc.' denotes the number of successfully generated variants.

| Cell Name | tot. | single-row |      |      | double-row |      |       |
|-----------|------|------------|------|------|------------|------|-------|
|           |      | suc.2      | M1   | #CPP | suc.       | M1   | #CPP  |
| AOI22xp5  | 16   | 4          | 0.00 | 6.00 | 16         | 0.00 | 6.00  |
| DFFHQNx1  | 4    | 0          | -    | -    | 4          | 4.96 | 14.00 |
| DHLx1     | 4    | 0          | -    | -    | 4          | 7.41 | 12.00 |
| OAI22xp5  | 16   | 4          | 0.00 | 6.00 | 16         | 0.00 | 6.00  |
| XNOR2xp5  | 4    | 0          | -    | -    | 4          | 2.68 | 8.00  |
| XOR2xp5   | 4    | 0          | -    | -    | 4          | 2.45 | 8.00  |

<sup>&</sup>lt;sup>2</sup> Some single-row cells cannot be routed mainly because the minimum area rule restricts the use of M0 and the via spacing rule restricts the use of M1, resulting in insufficient routing resources.

XOR were not presented. As shown in Table IV, we generated these cells under both single-row and double-row 2.5T configurations. The results indicate that some cells indeed cannot be routed in the single-row. In addition, some cells (such as AOI22xp5) cannot generate all pin assignment variants in single-row. However, in the double-row case, all these cells can generate all  $2^n$  possible variants.

Figure 7 shows the characterization results of 22 2.5T layouts of AOI22xp5 generated by varying the number of rows, pin assignments, and the use of FDM. For the input pins A1, A2, B1, and B2, 0 represents the frontside, and 1 represents the backside. Given the symmetrical nature of AOI22, the delay and transition metrics are averaged over all four input pins. All metrics are normalized to the values

<sup>&</sup>lt;sup>1</sup>Upon the submission of this manuscript, the authors have not received the layouts of 2.5T cells from [8], so a fair comparison for 2.5T cells is temporarily not possible.



Fig. 7: Characterization results for all generated 2.5T layout variants of AOI22xp5. The vertical axis shows the normalized values, calculated as the difference between the results of each layout variant and the manually designed layout from [9], which serves as the normalization reference (indicated by the red zero line in the figure).



Fig. 8: Layout of the best-performing design of AOI22xp5. The left side shows the single-row layout, while the right side shows the double-row layout.

of the manually designed layout from [9], which adopts a single-row design with FDM. The results indicate that pin assignment has a significant impact on performance. For both single-row and double-row cases, the optimal solution is with (A1,A2,B1,B2)=(1,0,0,0), where the single-row layout includes FDM. The design with the best performance (shown in Figure 8, the right side) achieves a 6.3% improvement in performance and a 4.3% reduction in power consumption compared to the manual baseline. Additionally, for the case where (A1,A2,B1,B2)=(1,1,0,0), the single-row layout with FDM outperforms the double-row layout in terms of performance. The remaining designs are not optimal but provide a range of alternative pin assignment options.

### C. Chip Design Quality Comparison

We compare chip design results using two standard cell libraries: one generated by our flow following the pin assignment settings of [8], and the other containing all  $2^n$  pin assignment variants per cell. Both use the same netlist and a cell utilization of 0.75, resulting in identical chip area. The difference lies in pin assignment optimization during routing. As shown in Table V, increasing from 4 to all  $2^n$  pin assignment variants reduces total wirelength by 10% and eliminates DS-nets.

TABLE V: Impact of cell variant count on block-level wirelength and DS-net count. The 4 variants refer to PBal, PBalswap, PFront, and PBack from [8].

| Design      | 4 Varia   | nts [8] | All 2 <sup>n</sup> Variants |         |  |
|-------------|-----------|---------|-----------------------------|---------|--|
|             | WL(um)    | #DS-NET | WL(um)                      | #DS-NET |  |
| riscv32i    | 13365.82  | 572     | 12744.74                    | 0       |  |
| tinyRocket  | 34256.42  | 1522    | 31731.23                    | 0       |  |
| blackparrot | 305520.86 | 8592    | 272191.64                   | 0       |  |
| Norm.       | 1.00      | 1.00    | 0.90                        | 0.00    |  |

## V. Conclusion

In this work, we present a standard cell synthesis framework for dual-sided 3D-stacked transistors. Our SMT-based mergeaware placement and SAT-based dual-sided routing enable efficient synthesis of 3.5T/2.5T cells. Experiments show our method achieves 4% lower cell area, 4% fewer vias, and 7% less M0 usage on average compared to prior 3.5T designs [8]. By supporting multi-row placement and FDM insertion, our approach broadens the design space and identifies layouts surpassing manual designs, with the best AOI22xp5 variant improving performance by 6.3% and power by 4.3% [9]. Chiplevel results further show that using a cell library with all  $2^n$  pin assignment variants achieves a 10% reduction in total wirelength and completely eliminates DS-nets, compared to libraries with limited pin assignment options. These results highlight the potential of our framework for practical dualsided 3D-stacked transistor design.

# ACKNOWLEDGE

This project is supported in part by Grant QYJS-2023-2303-B and 111 project (B18001).

<sup>&</sup>lt;sup>3</sup>The layouts provided by the authors of [8] do not contain enough information to run the commercial physical implementation flow.

#### REFERENCES

- J. Ryckaert, P. Schuddinck, P. Weckx, G. Bouche, B. Vincent, J. Smith, Y. Sherazi, A. Mallik, H. Mertens, S. Demuynck *et al.*, "The complementary fet (cfet) for cmos scaling beyond n3," in 2018 IEEE Symposium on Vlsi Technology. IEEE, 2018, pp. 141–142.
- [2] O. Zografos, B. Chehab, P. Schuddinck, G. Mirabelli, N. Kakarla, Y. Xiang, P. Weckx, and J. Ryckaert, "Design enablement of cfet devices for sub-2nm cmos nodes," in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022, pp. 29–33.
- [3] H. Kükner, G. Mirabelli, S. Yang, L. Verschueren, J. Bömmels, J. Lin, D. Abdi, A. Farokhnejad, A. Zografos, N. Horiguchi et al., "Double-row cfet: Design technology co-optimization for area efficient a7 technology node," in 2024 IEEE International Electron Devices Meeting (IEDM). IEEE, 2024, pp. 1–4.
- [4] S. M. Y. Sherazi, J. K. Chae, P. Debacker, L. Matti, D. Verkest, A. Mocuta, R. Kim, A. Spessot, A. Dounde, and J. Ryckaert, "Cfet standard-cell design down to 3track height for node 3nm and below," in *Design-Process-Technology Co-optimization for Manufacturability XIII*, vol. 10962. SPIE, 2019, pp. 16–27.
- [5] E. Park and T. Song, "Complementary fet (cfet) standard cell design for low parasitics and its impact on vlsi prediction at 3-nm process," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 31, no. 2, pp. 177–187, 2022.
- [6] S. Choi, C. Gilardi, P. Gutwin, R. M. Radway, T. Srimani, and S. Mitra, "Omni 3d: Beol-compatible 3-d logic with omnipresent power, signal, and clock," *IEEE Transactions on Electron Devices*, 2025.
- [7] H. Lu, X. Jiang, Y. Chu, Z. Xu, R. Guo, W. Peng, Y. Lin, R. Wang, H. Wu, and R. Huang, "A tale of two sides of wafer: Physical implementation and block-level ppa on flip fet with dual-sided signals," 2025 Design, Automation Test in Europe Conference Exhibition, 2025.
- [8] J. Ahn and T. Kim, "Design and technology co-optimization utilizing flip-fet (ffet) standard cells," in *Proceedings of the 62nd ACM/IEEE Design Automation Conference (DAC)*. San Francisco, CA, USA: ACM, 2025.
- [9] R. Guo, H. Lu, J. Sun, X. Jiang, L. Zhang, M. Li, Y. Lin, R. Wang, H. Wu, and R. Huang, "Design optimization of flip fet standard cells with dual-sided pins for ultimate scaling," *IEEE Transactions on Electron Devices*, pp. 1–7, 2025.
- [10] R. L. Maiasz and J. P. Hayes, "Layout optimization of CMOS functional cells," in 24th ACM/IEEE conference proceedings on Design automation conference - DAC '87. Miami Beach, Florida, United States: ACM Press, 1987, pp. 544–551.
- [11] M. Lefebvre and C. Chan, "Optimal ordering of gate signals in CMOS complex gates," in 1989 Proceedings of the IEEE Custom Integrated Circuits Conference. San Diego, CA, USA: IEEE, 1989, pp. 17.5/1–17.5/4.
- [12] A. Ziesemer, R. Reis, M. T. Moreira, M. E. Arendt, and N. L. Calazans, "Automatic layout synthesis with astran applied to asynchronous cells," in 2014 IEEE 5th Latin American Symposium on Circuits and Systems. IEEE, 2014, pp. 1–4.
- [13] Y.-L. Li, S.-T. Lin, S. Nishizawa, and H. Onodera, "MCell: multirow cell layout synthesis with resource constrained MAX-SAT based detailed routing," in *Proceedings of the 39th International Conference* on Computer-Aided Design. New York, NY, USA: Association for Computing Machinery, 2020, pp. 1–8.
- [14] P. Van Cleeff, S. Hougardy, J. Silvanus, and T. Werner, "BonnCell: Automatic Cell Layout in the 7-nm Era," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 39, no. 10, pp. 2872–2885, 2020.
- [15] C.-T. Ho, A. Ho, M. Fojtik, M. Kim, S. Wei, Y. Li, B. Khailany, and H. Ren, "NVCell 2: Routability-Driven Standard Cell Layout in Advanced Nodes with Lattice Graph Routability Model," in *Proceedings of the 2023 International Symposium on Physical Design*. Virtual Event USA: ACM, 2023, pp. 44–52.
- [16] C.-K. Cheng, A. B. Kahng, B. Lin, Y. Wang, and D. Yoon, "Gear-ratio-aware standard cell layout framework for dtco exploration," in *Proceedings of the 2023 ACM International Workshop on System-Level Interconnect Pathfinding*, 2023, pp. 1–10.
- [17] C.-K. Cheng, C.-T. Ho, D. Lee, and D. Park, "A routability-driven complimentary-fet (cfet) standard cell synthesis framework using smt," in *Proceedings of the 39th International Conference on Computer-Aided Design*, 2020, pp. 1–8.

- [18] C.-K. Cheng, C.-T. Ho, D. Lee, B. Lin, and D. Park, "Complementary-fet (cfet) standard cell synthesis framework for design and system technology co-optimization using smt," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 29, no. 6, pp. 1178–1191, 2021
- [19] C.-K. Cheng, C.-T. Ho, D. Lee, and B. Lin, "Multirow complementary-fet (cfet) standard cell synthesis framework using satisfiability modulo theories (smts)," *IEEE Journal on Exploratory Solid-State Computational Devices and Circuits*, vol. 7, no. 1, pp. 43–51, 2021.
- [20] S. Kim and T. Kim, "Optimal transistor folding and placement for synthesizing standard cells of complementary fet technology," in Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6.
- [21] D. Park, I. Kang, Y. Kim, S. Gao, B. Lin, and C.-K. Cheng, "ROAD: Routability Analysis and Diagnosis Framework Based on SAT Techniques," in *Proceedings of the 2019 International Symposium on Physical Design*. San Francisco CA USA: ACM, 2019, pp. 65–72.
- [22] X. Jia, Y. Cai, Q. Zhou, G. Chen, Z. Li, and Z. Li, "Mcfroute: A detailed router based on multi-commodity flow method," in 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2014, pp. 397–404.
- [23] D. Lee, D. Park, C.-T. Ho, I. Kang, H. Kim, S. Gao, B. Lin, and C.-K. Cheng, "SP&R: SMT-Based Simultaneous Place-and-Route for Standard Cell Synthesis of Advanced Nodes," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 40, no. 10, pp. 2142–2155, 2021.
- [24] L. De Moura and N. Bjørner, "Z3: An efficient smt solver," in *International conference on Tools and Algorithms for the Construction and Analysis of Systems*, 2008, pp. 337–340.
- [25] L. T. Clark, V. Vashishtha, L. Shifren, A. Gujja, S. Sinha, B. Cline, C. Ramamurthy, and G. Yeric, "Asap7: A 7-nm finfet predictive process design kit," *Microelectronics Journal*, vol. 53, pp. 105–115, 2016.