HOW IT WORKS

From dinoflaj2
Jump to: navigation, search

DINOFLAJ2 is based on the original version of DINOFLAJ. The original DINOFLAJ was implemented as a series of custom perl CGI scripts with an Oracle database back-end. Content was produced by parsing the original taxonomic documents into approximately 10 000 taxonomic records and 2000 bibliographic records, with 60 000 hypertext links between them. This data was then structured into a relational database format and loaded into Oracle. The CGI scripts retrieved this data and presented it to the user as HTML pages for display in a web browser. Appearance was managed with a series of HTML templates that were merged with the database content.


Major hurdles for the system, or any system designed to replace it, are:

  • the need to handle accented characters, the requirement to format text styles (italics, bold) in an intuitive way;
  • the necessity to have many links within individual fields (links that might also contain styles or accented characters);
  • and the need to automatically establish some 60 000 hypertext links rather than to establish them manually.

The last item is complicated by, for example, homonyms, orthographic variants, abbreviated citations in some contexts (e.g., "Gonyaulacysta" for "Gonyaulacysta jurassica"), and the various ways that references are cited (e.g., author name inside or outside parentheses). Though all these challenges were met to get the initial system running, it was found more difficult to enable in-place editing of the resulting database content. The text itself and its formatting was easily managed, but maintaining the consistency of the links between records at the same time, and making it easy for the user to edit, was difficult. Attempts to achieve a fully functional, robust, and fast editing system were unsuccessful and, at the time (1998), tools to manage this kind of content were non-existent or had significant performance, cost, or other unacceptable limitations.

By 2004, there had been many advancements in the management of this type of data using web interfaces. DINOFLAJ2 is the result. It is implemented with the same front-end document parsing and linking system as the original DINOFLAJ, but the resulting data is managed using a "wiki", a website system that can be interactively and collaboratively modified with a web browser. We have chosen to use MediaWiki, the same system used by the on-line Wikipedia, because it supports international character sets, large numbers of entries, arbitrarily-formatted links within the text, it is stress-tested on large datasets, and the source code is available for modification.

The system has been adapted by disabling some features and adding others, as follows:

  1. Modification of the wiki content is controlled only by the editors. Many of the collaborative features of a typical wiki are disabled for reasons of security and editorial consistency;
  2. MediaWiki Categories were defined to represent attributes of taxonomic entries, such as its Linnean rank (e.g., genus, species, family, etc.) and to separate records that are taxonomic from those that are bibliographic. The program code formats entries differently depending on these attributes;
  3. A special alphabetical index page was designed to manage Category lists that might be thousands of entries long (the user is initially presented with an A-Z listing);
  4. A special "parent" Category link was added. This allows representation of the tree structure of the Linnean classification. For example, the parent of a species is the genus to which it is assigned, its parent is a family or subfamily, and so on. When the parent is displayed, this allows selecting and presenting a list of its "children" (e.g., all the species assigned to a genus);

09/2015: Most of these custom code modifications have now been eliminated by using a newer version of MediaWiki combined with the Semantic MediaWiki extension. By modifying a few templates and altering the imported data with additional markup, most of the custom functionality of the prior software has been duplicated.

MediaWiki depends upon the MySQL database and Apache web server to store data and serve web pages, respectively.