DigiSim software package

DigiSim
A digitization simulation package
for the International Linear Collider

v01.00

Dhiman Chakraborty, Guilherme Lima, Vishnu Zutshi

Northern Illinois Center for
Accelerator and Detector Development

Physics Department
Northern Illinois University

Introduction

The purpose of the DigiSim package is to do detector digitization, for the CALICE test beam as a first goal, and ultimately for the full ILC detector. It is currently implemented as a standalone module, readily available for download and use.

The package reads the LCIO files produced by Geant4 applications (like LCDG4, Mokka or SLIC) and appends the raw hits produced to the output events. Most of the DigiSim (re)configuration can be done at run time editing an ASCII steering file, no recompilation is necessary. The existing modifiers (digitization classes) are extremely easy to setup and configure, and new functionality can also be easily added. DigiSim is thus very powerful, extensible and very simple to use and extend, it is well suitable for the simulation of CALICE test beam digitization.

DigiSim was originally developed in C++, and then ported to Java. Since then, further development has been performed in the Java version, including modifiers for crosstalk and random noise modeling. The java version has been reasonably stable since July 2005. We expect these new developments to be ported back into the original C++ version in the next weeks, and then to keep both versions up to date with each other.

Design requirements and development choices

The following requirements were considered when designing the software package:

To be initially based on C++ programming language, as most of the software of the CALICE collaboration. A Java version has also been developed, as suggested by the American LC community
Object-orientated, for easier development and maintenance of the source code
Based on the LCIO event model, which is becoming the de-facto standard for CALICE and ILC simulations
Used as a test-bed for the development of a digitization simulation software for the full ILC detector

We chose to use Marlin as the C++ framework on which DigiSim was developed. The java version of DigiSim has since been developed, and it currently has more features than the original C++ version. As the java version has been reasonably stable since July 2005, we expect to have the C++ version synchronized to the java version, and these two versions should be kept reasonably in synch.

Package dependencies

The C++ version of DigiSim has the following dependencies:

LCIO v01-04 or later for the C++ version - see http://lcio.desy.de/
Marlin v00-06 or later for the C++ version - see http://ilcsoft.desy.de/marlin

The Java version of DigiSim is part of the org.lcsim framework (currently on version 0.8). The org.lcsim framework itself has the following dependencies:

Sun's java development kit, version 1.5 or later - see http://java.sun.com
Maven version 1.0.2 - see http://maven.apache.org
GeomConverter 0.5 or later - see http://www.lcsim.org

Downloading, building and running DigiSim

These instructions are significantly different for the C++ and Java versions, although the configuration of the DigiSim package itself is basically identical. The instructions presented below have been tested within Fedora Linux environments, using g++ version 3.3 or Sun's Java version 1.5. If you try to use DigiSim in other environments, please tell us about your experience, good or bad.

Instructions for the C++ version

The source code for DigiSim can be checked out from the official CALICE CVS repository (see access instructions). Instructions for building under Linux are given below. These instructions have been tested within Fedora Linux environments, using g++ version 3.3, but it will probably build without problems in other versions of Linux and the g++ compilers as well.

# download the source code (see access instructions link above)
export CVS_RSH=ccvssh
export CVSROOT=:ext:yourUserName@cvssrv.ifh.de:/calice
ccvssh login
cvs co -d digisim calice_sim/digitization/digisim

# building
cd digisim
gmake

# Running
ln -sf /path/to/some/data.slcio inputfile
./bin/digisim digi.steer

An alternative to the next-to-last step would be to edit the steering file digi.steer, and insert the explicit name(s) of the input data file(s) to be digitized.

Several parameters of the digitizer can be configured, as explained later, by editing the file digi.steer.

Instructions for the Java version

The java version of DigiSim is part of the org.lcsim framework, so the download and build instructions are basically the ones provided in the LCSim website. Only a quick summary is presented here:

# download and install Sun's Java Development Kit 1.5 or later, see java website for details
# download and build Maven 1.0.2, see maven website for details. More recent versions are significantly different than 1.0.2.

# download GeomConverter
export CVSROOT=:pserver:anonymous@cvs.freehep.org:/cvs/lcd
cvs login (use your e-mail as the password)
cvs checkout GeomConverter

# build GeomConverter
cd GeomConverter
maven jar:install
cd ..

# download and build LCSim
cvs checkout lcsim
cd lcsim
maven jar:install
cd ..

# building the API documentation using Javadoc
cd GeomConverter
maven site
cd ../lcsim
maven site
cd ..

Please note that all maven commands are issued from the top directory of each package, where the project configuration files project.xml and project.properties are located. The API documents can then be consulted pointing your browser to the local file target/docs/apidocs/index.html.

Running DigiSim / Java

There are two distinct ways of running the java version of DigiSim:

(1) Running in standalone mode, saving the output file with raw hits and digitized hits for further processing; and
(2) Running DigiSim as a driver, from inside JAS3, as a preprocessor to your favourite analysis or reconstruction drivers.

Each of these running modes has its own pros and cons. For instance, JAS3 GUI is very intuitive and friendly, its event browser and event display features are very helpful to use the drivers as plugins to build complicated reconstruction chains, but making sure one is using the right jars and source code is not always obvious to the uninitiated. Running in standalone mode is more convenient for running remotely over slow connections, and the user might want more control over an special environment, by tuning the CLASSPATH of a single session without changing the overall
setup. Moreover, the standalone steps can be saved in a script for a faster startup. I personally prefer running long jobs outside of the graphical environment, and use JAS only to look at the plots and produce nice figures.

Running DigiSim/Java in Standalone mode

After building the lcsim jar file following the instructions above, one can run DigiSim standalone by typing:

source addjars.sh ~/.maven/repository # once per session, defines the CLASSPATH to enable the use of the LCSim framework
ln -sf /path/to/some/data.slcio inputfile
java org.lcsim.digisim.DigiSimMain

Alternatively to the next-to-last step, one can edit DigiSimMain.java source code and replace "inputfile" with a specific file name. I find the use of symbolic links very convenient here. An output file, digisim.slcio, contains all the raw hits and digitized hits collections, according to the configuration file used. Note that by default, DigiSim uses a configuration file based on the detector name, so that data files based on e.g. SDJan03 geometry will use the configuration file SDJan03.steer by default.

Note: Please note that "inputfile" is currently hardwired in the source code of DigiSimMain class, despite the line "LCIOInputFiles inputfile" present in the configuration file. That line affects only the C++ version of DigiSim, not the Java version.

Running DigiSim/Java from inside JAS3

DigiSim can be run from inside JAS3, using this driver: DigiSimExample.java. Open this file in JAS3, compile it and load it. You may want to load your favourite analysis or reconstruction drivers here as well. Then open an input LCIO data file to be digitized, and run some events one by one.

You may want to open the LCSim event browser and look at some raw data (RawCalorimeterHit class) or some digitized data (CalorimeterHit class). Then rewind the data source and run over all events.

Note: Be sure to select org.lcsim plugin when you open the input data file. If no dialog box opens at this point, make sure you have lcsim.jar file loaded, by checking that the LCSim event browser is available from View menu.

How DigiSim works

The DigiSim package works by using a chain of "modifiers", which will apply successive transformations to the input simulated hits. The resulting raw hits are then simply appended to the LCEvent, and get automatically written to the output LCIO file. Fig.1 shows the DigiSim class diagram, which is helpful to understand how DigiSim works. Note that the ellipses represent the Marlin/C++ and the LCSim/Java frameworks. Right below the frameworks are the framework-specific liaison classes DigiSimProcessor and DigiSimDriver, whose interfaces are imposed by the frameworks. All other classes have basically the same interfaces and have the same functionality in the Java and C++ versions.

Figure 1: Class diagram for the digitization simulation package DigiSim. Please note the inheritance relationships represented by the solid arrows and the containment (usage) relationships represented by the solid (dashed) line and open arrows. AbstractCalHitModifier, RandomNoise and FunctionModifier are abstract classes, defining the interfaces to be followed by their subclasses.

The frameworks, namely Marlin in C++ or LCSim in Java, take care of all the I/O, and call specific DigiSim hooks for initialization, event processing and finalization. The hooks are actually defined by each framework, hence DigiSimProcessor (DigiSimDriver) is the only DigiSim class which knows about the Marlin (LCSim) framework, and so abides by the interface imposed for all Marlin Processors (LCSim Drivers). These classes instantiate one digitizer per subdetector to be digitized.

The Digitizer class is responsible for managing the whole digitization processing for its subdetector. During its initialization, all the requested modifiers are instantiated and configured, according to the DigiSim configuration file, or steering file.

The processing which takes place during the event loop is better understood by analysing Fig.2. The modifiers will act on transient copies of the calorimeter hits (class TempCalHit), which is used as both input and output to the modifiers' event processing method. The abstract class CalHitModifier defines the interface to be inherited by the modifiers.

Figure 2: Illustration of the processing inside the DigiSim event loop.

Figure 2 - Diagram illustrating the event processing loop. (Click on the figure for better resolution)

At the event loop, events are passed to the Digitizer, which extracts the simulated hits (SimCalorimeterHits) from an LCCollection (SimHitsLCCollection). Simulated hits are converted into the transient hits (class TempCalHits) and passed through a chain of modifiers. Each modifier modifies the input TempCalHits by applying their own transformation. After all the modifiers have been processed, the final TempCalHit objects are finally converted into RawCalorimeterHits (including some double to integer conversions), which are then stored into a new LCCollection (RawHitsLCCollection) and appended to the event. The framework provides default processors/drivers which take care of writing the modified events into the output file.

Configuring DigiSim and its modifiers

An arbitrary number of modifiers can be defined and used within any DigiSim run. It is possible to configure and use any number of modifiers of any single existing modifier type. DigiSimProcessor is a Marlin processor, therefore it can receive any number of parameters from the Marlin steering file. The modifiers can then be configured on-the-fly, using parameters from the steering file (see this simple example).

In the java version, the DigiSim configuration is very similar. Note that by design, the very same Marlin steering file can also be used in the Java version as well, and this simplifies the maintenance of the configuration files in the long term.

Existing modifiers

There are currently a few general modifiers implemented and ready for use. Many of the existing modifiers implement a smeared linear transformation. See Fig. 3 for a graphical representation of what we mean with smeared linear transformation, but please note that SmearedLinear modifier has been deprecated, and replaced by simpler modifiers SmearedGain and SmearedTiming.

Figure 3 - Illustration of the hit smearing procedure implemented by a typical modifier,

and an explanation of some of the existing modifiers. (Click on the figure for better resolution)

This is an alphabetical listing of the existing modifiers, with a brief description:

AbsValueDiscrimination

Configuration line example:

# Two parameters: (1) threshold, and (2) width of smearing on threshold
HBdiscrim AbsValueDiscrimination 8 1

A modifier for basic discrimination on the absolute value of energy. This means that hits with energies in the range [-threshold;+threshold] are discarded.

Negative contributions are due to random noise, and large negative values of random noise may be kept in an attempt to cancel large positive noise, thus avoiding a positive biasing of the total average energy deposition due to random noise.
Crosstalk

Configuration line example:

# Two parameters: (1) mean value of crosstalk to first neighbors; and (2) width of smearing on the first parameter
HBcrosstalk Crosstalk 0.020 0.005

This modifier models the light crosstalk on scintillator cells, so it uses the Segmentation.getNeighbourIDs() method to find what are the neighbors. Only first neighbors are assigned crosstalk contributions.
DeadCell

      # Five parameters: cellID components of a specific cell:
      # (1) system, (2) barrel/endcap flag, (3) layer index, (4) theta index, (5) phi index
      HBDeadCell      DeadCell          3    0    12    34    56

This modifier always removes any hit for the specified cellID. Please note that there is no consistency check on the validity of the cellID provided. If a bad ID is provided, no hit will ever get removed. One modifier has to be provided for each dead cell.
ExponentialNoise

      # First five parameters from RandomNoise: (1) system, (2) barrel/endcap flag, (3) noise-only threshold
      # (4) nominal time and (5) sigma of time smearing
# One additional parameter: (6) µ = mean of the exponential distribution
      HBExpoNoise    ExponentialNoise   3     0      7       100   100    0.6

This modifier inherits from RandomNoise, and defines an exponential noise distribution, with probability=0 for negative amplitudes. The implementation uses the exponential distribution from Apache's commons-math library, see http://jakarta.apache.org/commons/math/userguide/distribution.html.

Please read RandomNoise documentation for more details.
FunctionModifier (abstract)

An abstract function-based modifier. Its subclasses must implement the following abstract function:

virtual double transformEnergy(const TempCalHit& hit) const = 0;

The values returned from this function will overwrite the ADC counts of the transient hits.
GainDiscrimination

# Four parameters: (1) nominal gain, (2) gain width, (3) nominal threshold, and (4) threshold width
HBlightCollEff GainDiscrimination 0.0111 0.0029 1 0

A simple modifier inheriting directly from CalHitModifier. In the example above, a smeared factor of (0.0111 +/- 0.0029) is applied to each hit independently, and then hits with the "energy" field below 1 get removed. Please note that the field is called "energy", but the interpretation may be different, like "number of photons collected" in the example above. Therefore, a threshold of 1 means that it does not make sense to have a fraction of a photon collected. A fixed gain or threshold is applied if the widths (parameters 2 or 4) are set to zero.
GaussianNoise

      # First five parameters from RandomNoise: (1) system, (2) barrel/endcap flag, (3) noise-only threshold
      # (4) nominal time and (5) sigma of time smearing
# Two additional parameters: (6) mean, and (7) width of the gaussian distribution.
      # width<0 means that noise-only threshold acts on absolute values only, thus both negative and positive tails are used
      HBGaussNoise    GaussianNoise   3     0      7       100   100    0.0   -1.6

This modifier inherits from RandomNoise, and defines a gaussian noise distribution. The implementation uses the gaussian distribution from Apache's commons-math library, see http://jakarta.apache.org/commons/math/userguide/distribution.html.

Please read RandomNoise documentation for more details.
HotCell

      # Modifier parameters:
      # (1) amplitude mean and (2) sigma of the smearing on the amplitude around the mean
      # (3) timing mean and (4) sigma of the smearing on the timing around the mean
      # (5-9) are the cellID components of a specific cell: (5) system, (6) barrel/endcap flag,
      # (7) layer index, (8) theta index, (9) phi index
      HBHotCell       HotCell      252525   0    101010   0        3   0 12 123 345

A simplistic modifier for hot cells. While RandomNoise selects random cell for noise assignment, HotCell picks a specific cell, and randomly draws the amplitude and timing of the noise to be assigned to that cell. In the configuration line above, the cell (12,123,345) will get fixed (non-random, sigma=0) amplitudes and timestamps according to the values provided above. These values were present in the final hits. Please note that, similarly to the DeadCell modifier described above, no check is made on the validity of the last five parameters. In particular, system=3, barrelEndcapFlag=0 needs to be compatible with an HB (HcalBarrel) cell, and the other cell indices must also be valid.
RandomNoise

# Five common parameters: (1) system, (2) barrel/endcap flag, (3) noise-only threshold
# (4) nominal time and (5) sigma of time smearing

An abstract modifier, suggests a common behavior for noise modeling modifiers. Its subclasses define the specific noise distribution to be used for noise modeling. The common behavior implemented in RandomNoise modifier assumes that noise should be added to all cells, but that the noise assigned to most cells is actually significantly small, and will not survive discrimination. Considering the huge number of cells in typical ILC hadron calorimeters, and for performance reasons, the random noise is added to random cells, in a two-stage processing, as explained below.

In the first step, the full noise distribution is used to add noise to all existing hits, as the additional noise may contribute to make some hits survive discrimination. All hits, therefore, will have some noise contribution after this first step.

The second step's purpose is to add noise-only cells, when the noise is large enough to maybe survive discrimination. An operational noise-only threshold is defined in the configuration line, which is used to calculate the probability of any cell to receive noise with amplitude larger than the noise-only threshold. The mean number (Nmean) of cells above the noise-only threshold is then given by [the total number of cells in a given component (say Hcal barrel)] times [the probability of any cell being above the noise-only threshold]. Nmean is calculated during initialization of the modifier, and then for each event, the number of cells above theshold (Nabove) is drawn from a Poisson distribution, with mean Nmean. Then a total of Nabove cells are randomly drawn to receive random noise. The noise amplitude is then forced to be above the noise-only threshold, but still following the distribution provided. CellSelector is the class responsible for randomly drawing valid cells from a given subdetector.

In order to be usable in the context of the two-steps processing described above, the RandomNoise subclasses have to define the noise distribution, while implement the following methods based on that distribution:
```
    abstract public double drawRandomNoise();
```
```
    abstract public double getProbabilityAboveThreshold();
```
```
    abstract public double drawRandomNoiseAboveThreshold();
```
After reading about the two-steps processing in RandomNoise modifier, the purpose of these methods should be clear. Method drawRandomNoise() must return noise amplitudes according to the full distribution to be used in the first step, while drawRandomNoiseAboveThreshold() returns noise amplitudes forced to be above the noise-only threshold, to be used in the second step. Method getProbabilityAboveThreshold() is used during the initialization of the modifier, to determine the Nmean parameter.

This modifier currently needs to know the system and barrel/endcap flags, which are used to encode the cellID/rawID of the newly added noise-only cells. For some examples, please take a look at the ExponentialNoise and GaussianNoise modifiers.
SiPMSaturation

# Two parameters: (1) nominal gain for the linear regime, and (2) lower limit of the saturation region
HBSiPMSaturat SiPMSaturation 1 2200

A "quick-and-dirty" implementation of photosensor saturation effects. This modifier models a linear regime for amplitudes below the limit, and a constant output for amplitudes above the limit. Considering the numbers above, a maximum output of 2200 would be provided, for a total number of 2200 PEs or more. This modifier inherits from FunctionModifier.
SmearedGain

# Two parameters: (1) nominal gain, and (2) width of the gain smearing
HBlightYield SmearedGain 10000000 0

The simplest non-trivial function-based modifier. It models a smeared linear transformation on energy. If the nominal gain is set to 1 without smearing (zero width), SmearedGain does not alter the input values, thus acting like an identity transformation.
SmearedTiming

# Two parameters: (1) nominal factor on timing, and (2) width of the smearing on the first parameter
HBtimeSmear SmearedTiming 1000000 10000

Very similar to SmearedGain modifier, but it applies the smeared linear transformation on the timing instead. If the nominal factor on timing is set to 1 without smearing (zero width), SmearedTiming does not alter the input values, thus acting like an identity transformation on the hit timing.

Real-life digitization: creating new modifiers

In order to be properly controlled by the Digitizer, all modifiers should inherit the interface from the abstract class CalHitModifier, implementing init(), processEvent(), print() and newInstance() methods, see figure below.

Creating Modifiers: interface inheritance and functions to be implemented

Figure 4 - Creating new modifiers. The member functions in red are the ones which need to be implemented by the new modifiers. (Click on the figure for better resolution). Please note that this figure is somewhat obsolete: transformTime(hit) has been deprecated, and a current subclass of FunctionModifier is SmearedGain.

In spite of their simplicity, several of the typical effects from the digitization processes can be represented quite well by appropriate configuration of one of these very simple-minded modifiers. Examples are uniform inefficiencies of say (97.8+/- 0.5)%, or zero-suppression say below (100+/-2) ADC counts. Both cases can be represented well by an instance of the existing GainDiscrimination modifier. The first one can also be modeled by the simpler SmearedGain modifier.

At the next step of increasing complexity, one anticipates that some other effects, like charge saturation or signal integration, can be realized with function calls, like:

double smearedEnergy = transformEnergy(aHit);

Such modifiers can be created very easily, e.g. by inheriting from FunctionModifier and implementing the transformEnergy() function shown in red in Fig.4, or by copying the code from SmearedTiming modifier, changing the class name appropriately and modifying the transformTime() method.

Another set of effects would fall in the next level of complexity, like cell ganging, random noise or crosstalk. These effects typically require external information, like cell neighborhood, which is not readily available in the hit itself. Geometry-dependent modifiers have been developed to model crosstalk, exponential and gaussian noise, while keeping the geometry-dependent processing isolated into reusable geometry-aware classes like CellSelector. In general, such modifiers should inherit directly from CalHitModifier, and implement its virtual methods init(), processHits(), print() and newInstance(). See Crosstalk modifier for a good example of a modifier which depends on external classes.

Analysis (java)

Some simple java analysis code is available in the subdirectory java of the C++ version. It was used to demonstrate that the modifiers are doing what they are supposed to do. Please find usage instructions in the README file.

References

LCIO web page: http://lcio.desy.de
Marlin web page: http://ilcsoft.desy.de/marlin
LCSim web page: http://www.lcsim.org
Java web page: http://java.sun.com
Maven web page: http://maven.apache.org
Apache commons-math web page: http://jakarta.apache.org/commons/math

DigiSim A digitization simulation package for the International Linear Collider