

#### **TB Return for the SDHCAL**

Vincent Boudry École polytechnique



**CALICE DAQ Day** 09/11/2011







#### SHDCAL TB 2011

- SDHCAL Test Beam in June/July 2011
  - ► DAQ2 Failed! → back to USB version of the readout
- DHCAL Test Beam October 2011
  - ▶ 2 weeks of commissioning at PS mid-september
    - Grounding
    - Configuration with OracleDB
    - ◆ FW dev (for power pulsing) : not successful
  - Readout of 36 chambers during 3 days but precarious stability.
    - ◆ Readout with 2 LDAs & DIF FW40 (no power pulsing).
    - → OK for 10 days at SPS in October
- 10 days at SPS
  - Still configuration pbms with 46 Chambers, μMegas, instabilities
  - → 3 days of intensive data taking (~30 GB, ~50k triggers ~ 500k events) and 36 chambers [Analysis ongoing]



# First large scale test @ PS

- 2 weeks @ PS
  - ▶ SDHCAL with 31 chambers (~2/3 of full det).
    - ◆ 90 DIFs, 2 LDAs, 13 DCC, 1 CCC, 4 PCs
  - ► ~4400 ASIC / 285k channels individually configured
- Solved grounding problems, reset procedure, mis-functionnal elements, FW glitches, Data corruption

Readout ~100k triggers in test beam mode
 (10 GB of data)

- ▶ ≥1 events per trigger
- trigger on scintillators







| lda | dec | cumulative data size | nb of shmwr | Failed | Corrupted |
|-----|-----|----------------------|-------------|--------|-----------|
| 1   | 1   | 330200               | 357         | 0      | 0         |
| 1   | 2   | 256638               | 357         | 0      | 0         |
| 1   | 3   | 1891131              | 355         | 0      | 1         |
| 1   | 4   | 720476               | 357         | 0:     | 0         |
| 1   | 5   | 662954               | 357         | 0      | 0         |
| 1   | 6   | 944784               | 357         | 0      | 0         |
| 1   | 7   | 691332               | 357         | 0      | 0         |
| 1   | 8   | 719920               | 355         | 0      | 1         |
| 1   | 9   | 1289548              | 355         | 0      | 1         |
| 2   | 1   | 0                    | 0           | 0.     | 0         |
| 2   | 2   | 0                    | 0           | 0      | 0         |
| 2   | 3   | 0                    | 0           | 0      | 0         |
| 2   | 4   | 1165448              | 357         | 0      | 0         |
| 2   | 5   | 802156               | 355         | 0      | i         |
| 2   | 6   | 838746               | 357         | 0      | 0         |
| 2   | 7   | 1927652              | 357         | 0:     | 0         |
| 2   | 8   | 2155632              | 357         | 0      | 0         |
| 2   | 9   | 1690838              | 355         | 0      | 1         |
| 3   | 1   | 1287528              | 355         | 0      | i .       |

au LLR | CALICE IN2P3 | Orsay, 17/10/2011

# SDHCAL: first data with DAQ2



#### **Lessons learned**

- Some technical choices made in emergency (BUSY signal passing, sequencing of readout & reset, handling of noise, grounding, SW infrastructure...)
  - Reminder: the DAQ2 is also a technological prototype
    - test for deported concentrator cards, min. cabling
    - lack of flexibility in implementation for TB (ex: RamFull)
    - Sensitive to noise
- → Review choices & possible improvement, manpower
  - ► ASAP: november 2011
    - New forces (Mainz) → CCC & rethinking
    - ◆ Re-use of ODR card (RHUL part of AIDA)
    - Rethink the DAQ for the next year
      - version 2.5 (LDA free)
      - vs version 3 (better protocols, links, new CCC)
  - ▶ HCAL & DAQ day at DESY in December
  - AIDA meeting in February

## Summary of internal review

- internal "Developer's day" last friday @ IPNL
  - ▶ Debriefing of TB ▶ Brainstorming on solutions (tb cont'd today)
- Main conclusions
  - DIF FW has to be consolidated
  - Diagnostic tools have to be improved
  - ► LDA is evil but still usable for small, table top, set-up
    - ◆ Wrong choice made 1 year ago...
  - ► ODR was forgotten on the path. → See Barry's talk
  - ► GigaDCC (replacement of LDA) will not be ready for next TB (April)
    - ◆ Might be more than GDCC → See Remi's talk
  - ▶ Back-up solution has to be implemented months before TB → USB readout → see Guillaume's talk
  - ► HDMI cables are not the optimal... → something else for v3 of DAQ ?
  - ► Manpower is scarce and will require better coordination.

# **Configuration of HW**

- The configuration of HW proved sometimes difficult
  - ► re-try, On/Off in a given order, ...
    - Probably due to some loophole in initialisation procedure (DIF FW, link establishment)
    - ◆ Some of the errors could be due to the re-reading of the configuration
  - ► Some cases not well understood (e.g. µMegas at SPS)
    - ◆ Worked at LAPP, not at CERN.

# Diagnosis & recovery tools

- Most of the efforts → running set-up
  - Most prominent causes of immediate failures removed
  - ► Instabilities during running remain
- More powerful diagnosis tools required:
  - Systematic Autopsy report & DAQ elog
  - ► Hierarchical diagnosis & reparation.
    - ◆ Card FW (counters, registers)
  - Better overview of systems (DAQ histograms)
  - ► Improved Online DQ tools (histograms)
- Recovery:
  - ► Improved tagging of packets (counters) → coherency of events

Long list of todo... always pushed back.

#### **DIF FW**

- DIF FW is in a ~ working state (v40 of SDHCAL for TB)
- Requires some improvement
  - ► Framework (by Rémi) → DEV3
    - ◆ Packet numbering → better diagnotics
    - ◆ Better reset procedure
  - ► ROC interface (by Guillaume)
    - ◆ Variable reset length
    - Power pulsing management (worked with USB vers.)
    - ◆ Revalidation of ROC start-up procedure
  - ► Calibration for debugging & tests (µMegas)

# **Configuration Management**

#### ConfigDB worked fine

- HR2 & μROC implemented
- ► Easy to use python scripts ⊃ masking of noisy cells
- ► A fast remote / local ConfigDB procedure has been implemented (in emergency) & tested.

#### Main area of improvement:

- ► Speed (~15mins for 90 DIFs)
  - ◆ A mask (ASIC/DIF/DCC) procedure is (being) implemented.
    - The topology of the readout should be modified separately
  - ◆ The complete ASIC configuration will be produced once
  - ◆ The slowness of the configuration upload identified is due to a large ping response between CERN & CC. Could be by-passed by proxy in IPNL.
- GUI mod of set-up
  - ◆ Work on going too (M. Cerutti interface for ECAL).

#### Performances

- No CPU limitation was found during the TB (with 1 machine / LDA)
  - load sharing easy with XDAQ
  - Many bugs found : last ones in SW (~normal)
  - Re-Writing of low-level SW needed (see next slide)
- The present use of RAW ethernet (baseline of CDAQ2) over switches was pointed out as risky as it doesn't provide collision nor loss of packets.
  - ► They can also cause a straight in data input/output as they use the same kernel part.
  - ► The mainstream is to use the ODR on point-to-point connection.
    - ◆ ODR use should be re-evaluated/tested.
       Expertise lost with departure of D. Decotigny → See B. Green talk
- The use of IP protocol for the future version (GigaDCC) should be evaluated.
  - ► This seems to be the path envisaged for LHC upgrades.

#### Low level SW

#### Main idea

- ► Clean-up code make a driver
  - ◆ Ad-hoc ether card or ODR
- ► Allow for separate control/command & data eth. channels
  - ◆ Cleaner code, HW accel.
- ▶ Development to be done in // with GDCC
- Overall architecture being defined (S. Chollet, N. Roche)

#### LDA record

LECTOR

- Mechanically unsound
  - ► Unplugging of mezzanine card happened.
  - ► Unsoldering (all repaired... in principle)
- Instabilities:
  - ► Some channels tested @ LLR intensively didn't work @ CERN
  - ► Failure rate > when using more than 4 channels
    - ◆ But seems OK for 1 or 2...
- Grounding corrected by hand
  - ► ~AC coupling everywhere
- Principle OK
  - Many improvement on implementation needed
     → GDCC (See Rémi's talk)



# Organisation

- BUg & Devlopment tracking
  - ► IN2P3 forge → internal development (⊃ CALICE experts)
    - tickets https://forge.in2p3.fr/projects/calice/issues
  - Savannah (CERN) «Grand public» (TB users, FCAL people)



- Cleaning!!
  - Merging of vers.
  - TAGs
- Migration of all SW
  - Merge des libLDAs
- Branch for devt (as it should have)

Forge = SVN + Wiki + tickets tracking

# Organisation (foll'd)

- More frequent meetings (communication pbms)
  - ► Status → 1/mois
  - ► Informal meetings 1/week
    - ◆ Return
- Task Force (~3-4 people) for v3 & AIDA?
- Cooperation within AIDA
  - ► LAPP: CCC & BIF
  - ► LLR: integration SW EUDAQ ↔ CALICE DAQ
- Others ? FCAL, LPC

# Back up

# **CALICE DAQ2 elements**

# **Three TB Running modes:**

#### Physics

- as fast as possible IN SPILL,
- ▶ poissonian stat → As low as possible PILE-UP (or not!)
- Data with "low occupancy" (particle type & E dependant)

#### Demonstrator

- as close as possible from final ILC conditions
  - power pulsing, auto-trig
  - beam conditions close to ILC ? (Duty cycle, occupancy)

#### Calibration / noise

- ► a priori: off spill, fixed rate
- all cells ("maximum occupancy")





ODR = Off Detector Receiver LDA = Link Data Agregator

DCC = Data Concentrator Card

CCC = Clock & Control Card

DIF = Detetcor InterFace





ODR = Off Detector Receiver LDA = Link Data Agregator

DCC = Data Concentrator Card

CCC = Clock & Control Card

DIF = Detetcor InterFace



# Clock & Trigger jitter

- Trigger & busy handling (G. Vouters)
  - ► Trig (NIM) → CCC → LDA → DCC → DIF BUSY ← CCC ← LDA ← DCC ←
- Trigger Jitter between DIFs (FG)

#### Jitter measurement



#### **SW** status

- XDAQ + C library to DAQ2
- All critical elements are ready
  - ▶ Configuration DB ✓
- DAQ2 interface
- Semi-automatic noisy channels spotting & correcting (monitoring)
- Clean Slow control
- ► GUI ≈
- interface to CondDBTo be clarified
- event display
  - FROG online
  - DRUID on LCIO
- Missing ancillaries
  - interface to the GRID
  - interface to the machine (> in AIDA WP8.6.2)



# **HW** availability

| Card            | #Avail | #Tested | #OK | Remark  All basic HW avail.                                                          |
|-----------------|--------|---------|-----|--------------------------------------------------------------------------------------|
| PC              | 6      | 6       | 6   | OS needs upgrade                                                                     |
| ODR             | 10     | 4       | 4   | (commercial board: no expected default)                                              |
| LDA             | 25     | 22      | 17  |                                                                                      |
| HDMI Mezzanines | 30     | 24      | 13  | 4 have faulty connectors and are being repaired. Not all cards have 10 conn. working |
| GEth mezzanines | 25+5   | 25      | 20  | 2 can easily be recovered                                                            |
| CCC Adapter     | 25     | 17      | 16  | Limits # of installations                                                            |
| CCC             | 10     | 10      | 10  | term adaptation maybe be needed                                                      |
| DCC             | 2+20   | 22      | 21  | 1 faulty channel on 1 card;<br>1 burned to be repaired                               |
| ECAL DIF        | 29     | 29      | 29  | equipement for 11 additional ones avail.                                             |
| SDHCAL DIF      | 190    | 190     | 183 | 7 being refurbished; mods needed for HR2 (ok for HR2b)                               |
| AHCAL DIF       | 4*     |         |     | *Being produced                                                                      |

Complete list of HW pieces & location available on https://twiki.cern.ch/twiki/bin/view/CALICE/HardwareList Vincent.Boudry@in2p3.fr 2G DAQ for the CALICE beam tests | LCWS'11 | Grenade, 29/09/2011 26/26