The Technical Board will review the software we use in CALICE towards the end of Nov so as to be sure it will enable us to do analysis studies as effectively and accurately as we would like. To be sure the software is designed correctly, we first have to understand what it needs to do and this means the requirements from the analysis work. To help us find out these software requirements, we have compiled a list of ten questions which we would ask you to complete and return by the end of this month. You should answer these for the analysis you are working on. The first five are mainly on your (current) method of working. The second five are to give us some idea of what you will need. Some questions refer to "long- term", which here means to completion of the "final" version of your analysis (even if you won't personally still be doing it). Please be quantitative if you can be, meaning that stating "I need the alignment to an accuracy of 0.1mm" is more useful than "I need accurate alignment". Qu 1 Briefly, what is the analysis you are doing and how many different runs do you use? Qu 2 Do you use raw or reconstructed data files for your analysis? If reconstructed, do you consider this will be sufficient long-term or do you assume you will eventually move to using raw data files and doing reconstruction yourself to have complete control of the reconstruction details? If you currently use raw files, do you plan to move to using reconstructed files later? Qu 3 Do you generate your own Mokka files or use centrally produced ones? If you generate events yourself, then why? Do you analyse only truth hits or do you require digitisation and reconstruction? If the latter are needed, would you use centrally produced files if they were available? Qu 4 Do you use the Grid? If so, do you use it for data access and/or for job submission, either reconstruction or analysis? Qu 5 Do you access data from the database for your analysis? Qu 6 What are the main difficulties and/or limitations you have to your analysis work at present? The sorts of things which might cause problems could be: - Access to the data files - Access to book-keeping information to know which runs to use - Missing or unreliable data in the files you use - Finding data in the database - Insufficient MC statistics - C++ problems - Documentation on what exists or who to contact for information - Lack of realism in the MC (if so, what is wrong; see Qu 9) - Need for run-dependences in cuts (e.g. due to alignment changes) or in data handling (e.g. due to missing layers) - Job performance and turnaround time - Using the Grid (what aspect; data access or job submission?) - Using the tools (e.g. cvs) - Something else If more than one is relevant, indicate this. Qu 7 Where do you think we should strike the balance between having access to the latest version of the reconstructed data and having a stable analysis? The issues concern getting the latest values of the calibration constants, bad channel lists, bad run periods, etc. Two extremes would be: - Have very infrequent central reconstruction runs (e.g. 1/year). In between, make heavy use of the database to get updates on calibrations, bad channel lists, etc. This may imply rerunning (parts of) the reconstruction privately to get the latest values which may require access to the raw data files. It also implies that the database may be changing without you being aware of it. Hence, your analysis may give different results each time you run unless you tag a specific version of the database to freeze the constants (although this undermines the idea of always having the latest values). You would then choose a convenient time to change the tag. - Have much more frequent central reconstruction runs (e.g. 1/month). Here, the latest constants are then used in the reconstruction each time and no database access later is required (although it would still be possible, of course). No private reconstruction would then be neccssary and your analysis would be stable as the files themselves would be frozen. There may be a significant overhead in having to recopy all the relevant data and MC files after each reprocessing (or whenever you want to update to a new version). However if you are running the analysis jobs on the Grid, this would not be an issue. Please give some indication of which of these working methods you would see as more efficient for you. Qu 8 What would you guess will be the limits or important systematics for the analysis long-term which you will need to study or evaluate? Some suggestions: - Effect of calibration errors - Effect of alignment errors - Effect of threshold suppression - Effect of noise or pedestal instabilities - Bias from reconstruction or selection algorithms, particular if not applicable to simulation - Understanding of bad channels - Particle identification - Trigger bias - Beam line effects - Something else Do you see how you will technically be able to do these studies given the software structure we have? If not, then where are the potential problems? Qu 9 What are the important factors to get right in the MC? Some suggestions: - Generation of beam spread (position, angle, energy and correlations between them) - Beam impact point on the detector (e.g. wafer or tile centre vs edge) - Material in front of the calorimeters - Material in the calorimeters (e.g. missing ECAL layers) - Trigger simulation - Double beam particle events or radiative beam events (e+/- with a photon) - Digitisation effects - Realistic non-nominal positioning of the detectors so as to match the real positions as far as possible Qu 10 For MC, is your analysis sensitive to run-dependent, or even event-dependent, effects and so will you need MC events generated to match the runs (or events within a run) that you will use?