



# Emerging technologies for tracking at Higgs factories

Marc Weber



#### Instrumentation challenges for the next decade

• The citius-altius-fortius challenge



• The ultimate-resolution, highest-efficiency challenge

• The complexity or system challenge







### Citius, altius, fortius

Ever more pixels/strips and ever larger detector area ~ billion silicon pixels and hundreds of  $m^2$ 



CMS inner tracker for high-luminosity upgrade

- Pixel number, resolution and frame rate all increase 10 - 1000 million pixels 1 – 14 bit resolution kHz – GHz frame rates
  - imply data streams of up to 100 Terabits/s
- Need to zero-suppress, compress, filter and process these data on and off the detector  $\rightarrow$  smart ASICs on detector
  - $\rightarrow$  capable data transmission of >> Tb/s
  - $\rightarrow$  powerfull trigger, data processing, visualisation and management systems

Highflex board version 1





#### So far commercial nanoelectronics has been working for us

Laws of Moore









http://www.nature.com/nphoton/journal/v7/n5/full/nphoton.2013.94.html

- Exponential progress with time
- Processor transistor count: 4x every 3 years
- Memory size: 2x every 3 years
- Single fiber bandwidth: 10x every 4 years

Marc Weber

#### **Selected topics in detector instrumentation**





#### **Silicon photonics**



GROSSE HERAUSFORDERUNGEN



#### **Monolithic sensors**

#### Are the days of hybrid pixel detectors numbered?





# **High-voltage CMOS sensors**

HV-CMOS: radiation sensor on a microchip in a standard technology

- High voltage offers several advantages. For one fast and large signals
- A large number of designs have been produced for possible application at CLIC, COMPASS, HL-ATLAS or Mu3e.
- Different foundries and processes: Global foundries (CMHV7SF), AMS (aH18) and TSI

- Excellent particle detection efficiency, radiation tolerance up to 10 MGy ( $5x10^{15} n_{eq}/cm^2$ ), time resolution ~6 ns (RMS), low cost
- Very complex electronics have been built-in, driven by HL-ATLAS







#### Latest design: ATLASPIX3

- Implemented in TSI 180 nm HVCMOS technology
- Features and data interfaces similar to RD53 pixel chip: L1 triggering tag, Aurora 64b66b output, 1.28Gb/s, 32 bit hit words, etc.
- Supports triggered readout with trigger latency up to 25 µs
- Supports serial powering
- Large sensor area of 20.2 x 21 mm<sup>2</sup>

Chip has just been fabricated and, so far, looks good.

Next TSI run in December will include CLIC 25 x 300  $\mu$ m<sup>2</sup> elongated pixels and designs with reduced capacitance





#### **4-D tracking**

#### The next paradigm change in silicon sensors?





## From 3D to 4D tracking

• At LHC, we can distinguish hundreds of different event vertices, provided they do not overlap



- Just imagine we could precisely measure *time* as well!
- Timing would allow much better separation of overlapping events and offer great physics benefits







#### 4D tracking would also revolutionize track fitting



HELMHOLTZ

GROSSE HERAUSFORDERUNGEN



Marc Weber

## **CMS MIP timing detector (MTD)**

The concept and technology for picosecond timing at HL-LHC came rather late. But it is so powerful that it will be implemented (see CMS-TDR-029)

- A timing layer will be placed between outer tracker and calorimeter
- In the barrel, the timing layer will be scintillating crystals and silicon photomultipliers
- For the end caps, it will be LGADs (~14 m<sup>2</sup> area, ~6 million pixels, pixels size: ~ 1 mm<sup>2</sup>)
- Radiation levels are rather high in the end caps: ~2 x 10<sup>15</sup> n<sub>eq</sub>/cm<sup>2</sup>. So resolution may deteriorate to ~50 ps with time

Ironically, this vertex detector is placed at 3 m distance from collision point



Endcap timing layer at - nose of calorimeters



### A fast timing system



• Need to consider interplay of sensor, pre-amplifier and TDC for picosecond timing





#### LGAD: Low-Gain Avalanche Detector

LGAD have the potential of replacing standard silicon sensors in almost every application



- Gain of LGADs is ~ 10 20
- Jitter of 50 µm thin sensors is tens of ps only
  - Need thin sensor to reduce drift times and maximize the slew rate (dV/dt)
  - Need internal charge amplification for fast and large signals
  - Need high-field (in gain region) to maximize drift velocity
  - (small capacity and leakage current)



#### **Trench isolation of pixels**



- For LGAD isolation of pixels at full gain is challenging.
- Smaller pixel sizes at large fill factor are attractive
- A promising approach is trench isolation







#### Fast monolithic pixel sensors in 130 nm SiGe



 Excellent time resolution can also been achieved without internal gain by minimizing pixel capacitance

see G. lacobucci et al. in arXiv:1908.09709v1 [physics.ins-det] 26 Aug 2019





#### **Conclusion: monolithic and 4D sensors**

- There is a plethora of sophisticated novel silicon sensor variants
- Monolithic charged particle sensors are around for a while now, and monolithic sensors with superb performance are being used in heavy-ion experiments
- Depleted monolithic sensors (e.g. HV-CMOS) missed application in HL-ATLAS by a smidgen

- (Depleted) monolithic sensors are likely to become dominant for future colliders. They are also interesting for outer tracker layers and calorimetry
- But, "Never underestimate an old technology"
- Hybrid silicon sensors will remain with us for a while, in particular in harsh environments
- The potential of 4D sensors is huge. Much scope for more R&D on sensors and systems



#### **Silicon photonics**

#### The future of data transmission?



18



### **Optical data transmission today**

Counting root

- Optical data transmission as implemented in LHC experiments is very powerful
- 15000 optical fibers for CMS tracker, ≤ 5 Gb/s per fiber
- Only a fraction of detector raw data is read out

Detector

Radiation hardness of on-detector lasers is critical







LHC solutions differ from standard telecommunication and use on-off keying and laser diodes (VECSELs) on the detector



#### Marc Weber

#### Read out ALL data?

- Lasers located off detector
- Efficient and radiation-hard electro-optical modulators
- Fewer fibers and much higher bandwidth due to wavelength division multiplexing (WDM)
- Silicon photonics for affordable photonic chips and CMOS compatibility





## A silicon photonics data transmission architecture



#### Silicon photonic modulators: Mach-Zehnder interferometer



#### Wavelength multiplexers and de-multiplexers



### Silicon photonic chips



- All silicon modulators
- Many functional designs

18

 $8.4 \times 7.3 \text{ mm}^2$ 

Innovative features

•

Monolithically integrated, compact WDM systems



 $2.4\times4.9\ mm^2$ 

 $12 \times 13 \text{ mm}^2$ 



**VIT** 

 $9.3 \times 9.3 \text{ mm}^2$ 

CARAS IN M M

Ê

20



# 11.3 Gb/s per channel data transmission



#### 7-channel (De-)MUX

two-stigmatic-points method



#### Marc Weber

#### **R&D** status and milestones

- 40 Gb/s demonstrator near completion
- 1 year: 160 Gb/s per fiber
- 5 years: >>1 Tb/s per fiber
- with 64 channels à 20-40 Gbaud and 2 bits per symbol



fiber-chip-coupling with angle-polished fibers



miniaturized drivers



 Fully integrated systems will be compact



#### **Conclusion: Data transmission**

- Silicon is not a great optical material and optical connectors are a pain
- Electric connections still dominate short distances inside a chip, along a silicon stave or within an electronic crate
- Silicon photonics in detector instrumentation is in its infancy

 However: silicon photonics could enable mind-boggling possibilities like powerful interconnections between detector layers, the trigger-less detector and, of course, reduce power consumption and material

• We will need another 5 years to be sure, but less than a decade to instrument a linear collider



caveats

DAQ and triggering in high-energy physics



27



#### CMS in the HL-LHC era



Event at HL-LHC

- Almost 15 000 modules and 300 million channels in outer tracker
- ~12 000 hits every 25 ns
- 1 000 hits (stubs) after first filtering in detector modules
- Data rate: ~100 Tbit/s
- 12.5 µs time limit for first level trigger decision (4 µs for track finding)
- Readout is highly complex





#### **Divide and conquer**

#### **Spatial partitioning**

**Time-multiplexing** 



- Separate detector into sectors
- Duplicate tracks at sector boundaries
- 9 sectors in φ



- Work on each event in parallel
- Each node processes "full" detector (region)
- 18 24 time multiplexing periods







#### System architecture

#### 2 FPGA layers

- Data distribution layer (DTC)
  - 9 regions (nonants)
  - 24 FPGA boards per region
- Track finding and fitting (TFP)
  - 18 time periods per region
  - 18 FPGA boards per region

216 boards

## 162 boards

234 Tb/s

260 Tb/s

#### Huge optical data transmission layer

- Each DTC connected with up to 72 modules
  - 23 400 optical links @ 10 Gb/s
  - Each TFP connected with up to 48 DTCs
    - 10 368 optical links @ 25 Gb/s
- DAQ links
  - 864 DAQ links @ 25 Gb/s ] 21.6 Tb/s



•

#### DTC prototype board: EureKA-Maru ATCA blade

- High-end FPGA Xilinx Virtex Ultrascale 9 Plus
  - 2.6 million logic cells
  - 6840 DSP slices
  - 345.9 Mb on chip memory
  - 120 x 32.72 Gb/s transceivers
    → ~ 4 Tb/s input/ouput capability
- Integrated IPMC & slow control solution based on Xilinx Zynq Ultrascale+
- 116 high-speed links through Firefly (25 Gb/s max.)



Marc Weber





#### System architecture



#### 2 FPGA layers

- Data distribution layer (DTC)
  9 regions (nonants) with 24 FPGA
  boards each
- Track finding and fitting (TFP) 18 time periods and 18 FPGA boards per region



#### Huge optical data transmission layer

- Each DTC connected with up to 72 modules 15 552 optical links @ 5/10 Gb/s
- Each TFP connected with up to 48 DTCs 7 776 optical links @ 25 Gb/s
- DAQ links 864 DAQ links @ 25 Gb/s

156 Tb/s 194 Tb/s

} 21.6 Tb/s



#### A new class of devices



### **Versal - architecture overview**

#### Latest Xilinx architecture

- More heterogeneous
- More complex

Adaptive compute acceleration platform (ACAP)

#### **Key Features**

- FPGAs + Processors + artificial intelligence engines
- Network on Chip backbone
  - high bandwidth & low latency
  - guaranteed QoS
  - memory mapped
  - built in arbitration
- Complex memory hierarchy
  (LUTRAM, BRAM, UltraRAM, Accelerator RAM, HBM, DDR)



https://www.xilinx.com/support/documentation/ white\_papers/wp505-versal-acap.pdf



34

MATTER AN

#### Heterogeneous DAQ systems



### **High-performance heterogeneous FPGA-GPU**

#### Comparison between AMD Radeon R9 Fury X and Xilinx Virtex-7 XC7VX1140T

(28nm manufacturing process for both, GPU and FPGA)



#### CMS low-level trigger system





.0.04

.0.0f

0.05

L1 trigger will require reconstruction of charged particles with transverse momentum > ~2 GeV/c

#### **FPGA Features**

Huge I/O bandwidth

Deterministic timing/runtimes

High bit-level processing performance

#### **GPU Features**

Rapid development cycles and high flexibility

Large bandwidth to external memory

High Floating-Point performance

Reconstruction and semiautomatic segmentation by GPUs



CMS low-level trigger system
 based on FPGA-GPUs
 track reconstruction and fitting
 Total data latency of 6.9 µs =
 2 us (data transfer) + 4.0 us

2 μs (data transfer) + 4.9 μs (GPU processing)

H. Mohr et al., JINST 12 C04019 (2017) HELMHOLTZ

GROSSE HERAUSFORDERUNGEN

36

MATTER AND

### **Conclusion: DAQ in high-energy physics**

- Handling of big data in real-time is possible: CMS track trigger with ~100 Tb/s in 4 µs latency
- Highly sophisticated parallel implementation on hundreds of custom FPGA boards required

- Progress in commercial microelectronics (FPGAs, transceivers, etc.) will still support us for some time
- However, architecture choices and algorithms are becoming dominant. This includes real-time deep learning

- The combination of FPGA GPU systems is extremely powerful
- Data processing will turn online data analysis





#### Summary



GROSSE HERAUSFORDERUNGEN





### Appendix





#### **Pixel structure**

- Pixels are based on floating electronics structure pixel electronics is placed into a deep n-well.
- Deep-n-well fulfills two tasks:
  - 1. Local substrate for electronics (isolated from p-substrate)
  - 2. Charge collecting electrode.
- The p-substrate region below the deep n-well is depleted by setting substrate to negative HV. Typical depletion depth: 30 50µm for 80 to 200 Ωcm resistivity. MIP signals are typically >5000e for 200 Ωcm substrate
- The substrate contacts are at the chip surface (undepleted parts of it)
- Largest capacitance from p-well/n-well junction

