Minutes of Meeting, May 26, 2006 1:00 CDT
Present: Rob, Suzanne, Jim, Wally
By Phone: Claude.
Minutes written by Rob. I have added some information
that I learned after the meeting.
Corrections and comments are welcome.
[ My editorial comments in square brackets. ]
Our next meeting is Friday June 2, 2006 at 1:00 PM CDT,
in the Woodshed, WH7X.
Thanks to everyone who prepared summaries of the architecture
choices for the various candidate log/notebooks. This information
is now summarized in ILC-doc-292. The information provided
by Wally was added to that document after the meeting.
The main topic of the meeting was to discuss the information summarized
in this document.
1) About the AD Elog:
a) All of 4 ILC test areas at Fermilab are already using it at some
level. Two new instances recently created for A0.
b) Wally does not believe that the architecture is robust enough to
serve all our needs for 10 years: security, searchability ...
c) We also had a discussion that we would like to avoid Perl if possible
( see 8), below ).
2) How do we react to 1a): if we decide to recommend a different
product then we need to explicitly address that fact that there is
already an investment in the AD Elog. In particular we need to
acknowledge the retraining costs and include those costs in the
tradeoffs.
3) Some of the logbook products require Oracle. Yet they say that
they are no cost. Jim tells us that the lab has a site wide Oracle
license and that the cost to add new Oracle deployments is small.
Moreover the ILC, at present, has a project wide Oracle license
in its spec.
4) We agreed to take one feature off of the use case list: we are not
looking for a product with which people can use to deploy a new logbook on
their own laptop. We are only considering a "centrally" deployed
server which people can access via a browser. This simplfies
the backup issue and any Oracle license issues.
5) Two logbooks are off our list:
a) The DESY-IHEP elog is too immature to be considered
for something we will start to use soon.
b) The PSI Elog appears to be much too hard to maintain.
6) Even though something is off our list, we should still understand its
features so that we know what we are missing.
7) What about the SNS E-log?
- Claude will look into it.
8) There seems to be a consensus that Perl has had its day:
a) The features of perl that were once unique, such as regexps are
now available in many languages, eg JAVA.
b) We have lots of horror stories that Perl code can be hard to
maintain, although we acknowledge that well written perl could
be easy to maintain.
c) New products are less likely to have a PERL API than a JAVA API.
This is a strike against the AD Elog. In the JLAB/SLAC elog, the
use of Perl is sufficiently restricted that it is not a serious
problem.
9) About PHP. Is it an acceptable language? We did not reach a
conclusion. So far we know:
a) It is popular at ANL and is expected to have a future.
b) In the past there were security concerns. No one knew if this
was still a concern.
10) So far as we know, none of the products is currently being used as
a notebook. All are used only as log books.
11) What would it take to add logbook functionality?
- CRL currently has annotation but not editing.
One could add editing of entries, with an edit on/off switch
on a per topic basis.
- AD Elog currently has a "repair" feature that can only be invoked
by administrators. Both the original and the modified
text are saved. If desired, this could be extended to a
general editing feature.
- Need to learn about others.
12) The JLAB/SLAC product creates entries by making a temporary XML
file that is periodically swept into the db. The other products
talk directly to the db. Do we care which we choose? The answer
is that both are OK.
13) About security. None of the products has state of the art security.
We should anticipate that, in the future, the requirement of state of
the art security may be imposed on us. For example, the computing
division has been told to migrate the various docdb instances to
certificate based access. Here is a summary of what we know about
security:
a) The AD elog has a distinction that people at addresses inside
the lab firewall do not need a password to read but people
outside the firewall do need a password.
b) There is a Java libary to support PKI certificates.
( PKI = Public Key Infrastructure, aka X.509 )
So it should be relatively straightforward to add this feature
to any of the JAVA based products.
c) The LHC community failed with a certificate only policy.
d) We need to understand if the effort required to get a certificate
has been reduced to the point that we have moved past the LHC
experience. In any case we need to account for this effort in
our judgements.
e) Suzanne thinks it would be relatively easy to add PKI authentication
to CRL.
14) A use case to remember:
- People come and go, so they need to be added and subtracted
from the security system.
15) About searching. Most of the products do searches of their
database data, but not of their attachments. In some cases it would
be straightforward to extend the searches to include attachments.
Searching is usually not indexed so searching can take a long time
if there are many entries. We need to understand the scope of our
problem to know if this is an issue for us.
The one exception is the DESY TTF logbook which uses Apache Lucene
http://lucene.apache.org/java/docs/
This is a tool that builds an index of the things searched; it can
do increments builds of the index as well as batch builds.
According to the FAQ there are tools to pull the text out of
.doc, .pdf, .xls and many other formats. These are not formally
part of Lucene but can be used to index documents in these formats.
Wally says that searching is a weak spot in the AD elog.
HepBook ( aka KBook for Knowlege Book ) also uses Lucene.
16) After the meeting Suzanne was curious if we could use google searching.
There is a syntax to tell google to restrict its searches to particular
sites. This only works if google has a button we can push to tell it
to update its indices ( otherwise it will not find newly created
entries ).
For example, the syntax to find references on the Fermilab site
to tonight's lecture on the Gathering Storm report is:
"Gathering Storm" site:fnal.gov
17) An aside on history. The AD elog started life as a notebook
at ORNL. It was taken to FNAL and converted to a logbook on
a separate development path. The SNS logbook is informed by
the experience at ORNL with their earlier product; but it is
a fresh start, not an evolution of the old product.
18) Claude reported that he had heard rumors that the CRL made
"unorthodox" use of its database. Suzanne replied that the CRL
puts a minimum amount of info in the db for each entry. The
rest of the information is stored in the HTML or XML file that
contains the body of the entry. One useful sideeffect of this
strategy is that the db can be 100% rebuilt by scanning the entries.
This has been done in the past when some users were careless with
backups.
Homework:
1) Questions for Claude to ask Jerzy:
- does the TD weblog support multiple logbooks
2) All:
a) Find out if the product can be used as a logbook ( ie with
editable entries ). Has it been used that way now or in the
past. If not, what would it take to make some class of entries
editable?
b) How hard would it be to add certificate based security if not
already present. Does the architecture allow a plug replacement
of the security mechanism or does it have tentacles throughout
the system.
c) Please clarify the present security system:
- individual or group accounts ( or both )
- can someone log in as an individual but have permissions
as a member of one or more groups?
- does the product maintain its own username/password
database or does it use some service provided, for example,
by the lab.
d) Is the architecture such that adding something like Lucene
is straightforward or is it an ugly hack.
3) Claude will learn something about the SNS elog.
4) Rob will learn about HepBook
There are minutes attached to this event.
Show them.