
Enter a word or phrase to
search the CENSML site for: |
|
|
|
This document is an initial
exploration of the concepts around CENSML and gives some impressions of what
it may look like. We are very interested in your feedback, either on the censml-discuss
list or directly to simonc@amphora-research.com.
Please remember that this is just a start; it is very likely that as a result
of the comments we receive, substantial changes will occur.... :-)
We hope to show some initial CENSML work at the next CENSA
meeting.
Basic requirements
The main aim of CENSML
is to support the transfer of ELN data between ELNs. Generally, this will be
from one ELN directly into another, but the design choices required to ensure
wide adoption by today's ELN vendors also makes CENSML an idea archive format
for ELN data.
We would like CENSML to be used as widely as possible. Therefore, the standard:
- assumes as little structure
as possible, because individual ELNs will embody different concepts of what
an "Experiment" is
- is flexible, to accommodate
the needs of a diversity of users
- pragmatic, avoiding
features which might look good on paper, yet would be an impediment to widespread
adoption
Standards such as CENSML
are a balance between what would be desirable from an architectural perspective,
and what is needed for the project to be a success in the real world. Hopefully,
we've found the right "trade-off" between these considerations.
Quick
overview of CENSML
Within CENSML, "Experiments"
consist of::
- some metadata
- one or more fields containing
the experiment write up
- zero or more related
files
An example CENSML packet
(for the impatient):
<?xml version="1.0"
encoding="ISO-8859-1" standalone="yes"?>
<censml:experiment xmlns:censml="http://www.censml.org/censml-v1"
xmlns:dc="http://purl.org/dc/">
.... first, the metadata ....
<censml:metadata>
<dc:title> My experiment </dc:title>
<dc:creator> Simon Coles </dc:creator>
<dc:description> A little experiment
to show censml </dc:description>
<dc:date>2001-08-29</dc:date>
</censml:metadata>
.... now, the write up ....
<censml:narrative title="aim">
<censml:representation type="text/xhtml"
preference="10" fidelity="10">
..either encoded binary file
or xlink to the file..
</censml:representation>
</censml:narrative>
.... finally, some related files ....
<censml:data>
<dc:title> An Excel file</dc:title>
<dc:creator> Terry Bavins </dc:creator>
<censml:representation t ype="application/excel"
preference="10" fidelity="10">
..either encoded binary file
or xlink to the file..
</censml:representation>
<censml:representation type="image/png"
preference="5" fidelity="3">
..either encoded binary file
or xlink to the file..
</censml:representation>
</censml:data>
</censml:experiment>
Basic
architecture
- Contents of a CENSML
packet
- General principles for
data storage & representation
- Attribution
- Types of data stored
- Including data
- Metadata
- The experiment write
up
- Other, related files
Contents
of a CENSML packet
A CENSML packet describes
an experiment. Experiments consist of:
- some metadata (for example,
the principle investigator, date created, etc.)
- the experiment write
up (possibly split into parts, like aim, method, results, conclusion).
- other, related files,
which may be linked to from a specific part of the narrative, or may just
be attached to the experiment. For example, Excel spreadsheets, instrument
data files, Graphs and other images, etc.
General
principles for data storage & representation
Attribution
An experiment is the responsibility of one person, the principle investigator.
However, the data within the experiment may have been generated by someone else.
Indivdual data elements can therefore have an attribute "author" which will
contain the common name of author of the element.
Data format
As a rule, the exporting ELN will store the information in a preferred format.
For example, if an Excel spreadsheet is used during an experiment, then the
preferred file format for that data is a .xls file. For the purposes of long
term accessibility and interoperability between ELNs, it is preferable for data
to be available in more than one format.
Therefore, for all data stored in a CENSML packet, CENSML has the concept of
"Format". So for an example, an Excel spreadsheet could be represented as a
.xls file, a .txt. file, a graphic, a .pdf, etc. Each of these formats will
have their own tradeoffs. Therefore:
<data>
<format type="application/excel" preference="10" fidelity="10">
....data here, or XLink to data file....
</format>
</data> Or, for some text that is part of the experimental write
up: <narrative title="Aim">
<format type="text/html" preference="1" fidelity="10">
<p>This is the aim of the experiment. We want to blow things
up. </p>
</format>
</narrative>
Notice that we use a MIME
type in the format's "type" attribute, here's
how you get more MIME types.
Within an experiment, data will be of three broad types:
- Textual information,
either as plain text, or more likely in some format which represents style
information and other rich attributes
- Image information, for
example bitmaps or SVGs
- Other information which
falls outside the above categories. For example data from an instrument, a
proprietary binary data file format from another application, or an XML file
(e.g. GAML).
To allow the data to be
moved into CENSML, and re-imported without loss of fidelity, most ELNs will
choose to represent the data in CENSML in as rich a format as possible. However,
to provide for interoperability and long term access, implementations *must*
provide a representation of the data as one of:
- a bitmap image, as a
PNG binary [a].
- text, as XHTML [b]
Where a binary file is
stored in CENSML, base64 will be used to encode it.
Metadata
Metadata will follow the Dublin
Core although some additional ELN-specific elements may be needed, and will
be defined as part of CENSML.
Experimental write up
Implementations may choose to structure the experiment write up in a variety
of ways; CENSML expects the write up to be stored in one or more fields.
Other related files
Parts of an experiment
Sources
of inspiration
In planning CENSML, we
found the following particularly helpful:
- W3C
Standards
- Other initiatives &
projects
- Liquent's
CreateXML service
and software offerings. The Liquent software claims to convert data into
XML, and the really interesting thing is they have avoided creating their
own XML formats, instead using existing, well defined formats.
- Books
- "XML in a Nutshell"
- Related standards
and initiatives
- METS
**Notes**
[a] PNG is preferred to GIF or JPEG because it is a popular,
open standard unencumbered by the legal difficulties which plague formats like
GIF. See http://www.w3.org/Graphics/PNG/.
[b] XHTML is preferred to other formats for marking up text
because:
- it is very easy to create,
parse, and otherwise manipulate XHTML documents
- HTML is a well understood
and popular format supported by many tools
- it is an XML-based format
Changelog
- 29-8-01 Initial spec
released
- 22-9-01 Added base64
as the encoding mechanism for inline attachments. It is implied by the use
of XML, but included for clarity.
Back to news
|