CENSML Concept Paper
 Printer-friendly version of this page
     
     Home
     The CENSML Project
     Why CENSML?
     CENSML News
     Mailing Lists
     Download Area



Enter a word or phrase to
search the CENSML site for:

  

This document is an initial exploration of the concepts around CENSML and gives some impressions of what it may look like. We are very interested in your feedback, either on the censml-discuss list or directly to simonc@amphora-research.com.

Please remember that this is just a start; it is very likely that as a result of the comments we receive, substantial changes will occur.... :-)

We hope to show some initial CENSML work at the next CENSA meeting.

Basic requirements

The main aim of CENSML is to support the transfer of ELN data between ELNs. Generally, this will be from one ELN directly into another, but the design choices required to ensure wide adoption by today's ELN vendors also makes CENSML an idea archive format for ELN data.

We would like CENSML to be used as widely as possible. Therefore, the standard:

  • assumes as little structure as possible, because individual ELNs will embody different concepts of what an "Experiment" is
  • is flexible, to accommodate the needs of a diversity of users
  • pragmatic, avoiding features which might look good on paper, yet would be an impediment to widespread adoption

Standards such as CENSML are a balance between what would be desirable from an architectural perspective, and what is needed for the project to be a success in the real world. Hopefully, we've found the right "trade-off" between these considerations.

Quick overview of CENSML

Within CENSML, "Experiments" consist of::

  • some metadata
  • one or more fields containing the experiment write up
  • zero or more related files

An example CENSML packet (for the impatient):  

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

  <censml:experiment xmlns:censml="http://www.censml.org/censml-v1"
                     xmlns:dc="http://purl.org/dc/">

    .... first, the metadata ....

    <censml:metadata>
      <dc:title> My experiment </dc:title>
      <dc:creator> Simon Coles </dc:creator>
      <dc:description> A little experiment to show censml </dc:description>
      <dc:date>2001-08-29</dc:date>
    </censml:metadata>

    .... now, the write up ....

    <censml:narrative title="aim">
      <censml:representation type="text/xhtml" preference="10" fidelity="10">
        ..either encoded binary file or xlink to the file..
      </censml:representation>

    </censml:narrative>

    .... finally, some related files ....

    <censml:data>
      <dc:title> An Excel file</dc:title>
      <dc:creator> Terry Bavins </dc:creator>
      <censml:representation t ype="application/excel" preference="10" fidelity="10">
        ..either encoded binary file or xlink to the file..
      </censml:representation>

      <censml:representation type="image/png" preference="5" fidelity="3">
        ..either encoded binary file or xlink to the file..
      </censml:representation>

    </censml:data>

  </censml:experiment>

Basic architecture

  • Contents of a CENSML packet
  • General principles for data storage & representation
    • Attribution
    • Types of data stored
    • Including data
  • Metadata
  • The experiment write up
  • Other, related files

Contents of a CENSML packet

A CENSML packet describes an experiment. Experiments consist of:

  • some metadata (for example, the principle investigator, date created, etc.)
  • the experiment write up (possibly split into parts, like aim, method, results, conclusion).
  • other, related files, which may be linked to from a specific part of the narrative, or may just be attached to the experiment. For example, Excel spreadsheets, instrument data files, Graphs and other images, etc.

General principles for data storage & representation

Attribution
An experiment is the responsibility of one person, the principle investigator. However, the data within the experiment may have been generated by someone else. Indivdual data elements can therefore have an attribute "author" which will contain the common name of author of the element.

Data format
As a rule, the exporting ELN will store the information in a preferred format. For example, if an Excel spreadsheet is used during an experiment, then the preferred file format for that data is a .xls file. For the purposes of long term accessibility and interoperability between ELNs, it is preferable for data to be available in more than one format.

Therefore, for all data stored in a CENSML packet, CENSML has the concept of "Format". So for an example, an Excel spreadsheet could be represented as a .xls file, a .txt. file, a graphic, a .pdf, etc. Each of these formats will have their own tradeoffs. Therefore:

<data>
 <format type="application/excel" preference="10" fidelity="10">
  ....data here, or XLink to data file....
 </format>
</data> Or, for some text that is part of the experimental write up: <narrative title="Aim">
 <format type="text/html" preference="1" fidelity="10">
  <p>This is the aim of the experiment. We want to blow things up. </p>
 </format>
</narrative>

Notice that we use a MIME type in the format's "type" attribute, here's how you get more MIME types.

Within an experiment, data will be of three broad types:

  • Textual information, either as plain text, or more likely in some format which represents style information and other rich attributes
  • Image information, for example bitmaps or SVGs
  • Other information which falls outside the above categories. For example data from an instrument, a proprietary binary data file format from another application, or an XML file (e.g. GAML).

To allow the data to be moved into CENSML, and re-imported without loss of fidelity, most ELNs will choose to represent the data in CENSML in as rich a format as possible. However, to provide for interoperability and long term access, implementations *must* provide a representation of the data as one of:

  • a bitmap image, as a PNG binary [a].
  • text, as XHTML [b]

Where a binary file is stored in CENSML, base64 will be used to encode it.

Metadata
Metadata will follow the Dublin Core although some additional ELN-specific elements may be needed, and will be defined as part of CENSML.

Experimental write up
Implementations may choose to structure the experiment write up in a variety of ways; CENSML expects the write up to be stored in one or more fields.

Other related files
Parts of an experiment

Sources of inspiration

In planning CENSML, we found the following particularly helpful:


**Notes**

[a] PNG is preferred to GIF or JPEG because it is a popular, open standard unencumbered by the legal difficulties which plague formats like GIF. See http://www.w3.org/Graphics/PNG/.

[b] XHTML is preferred to other formats for marking up text because:

  • it is very easy to create, parse, and otherwise manipulate XHTML documents
  • HTML is a well understood and popular format supported by many tools
  • it is an XML-based format

Changelog

  • 29-8-01 Initial spec released
  • 22-9-01 Added base64 as the encoding mechanism for inline attachments. It is implied by the use of XML, but included for clarity.

Back to news





Copyright © Amphora Research Systems Ltd., 2001-2
If you have problems using this site or have any suggestions, please e-mail the webmaster at simonc@amphora-research.com