Difference between revisions of "HOW IT WORKS"

From dinoflaj3
Jump to: navigation, search
 
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
DINOFLAJ2 is based on the original version of DINOFLAJ. The original DINOFLAJ was implemented as a series of custom perl CGI scripts with an Oracle database back-end. Content was produced by parsing the original taxonomic documents into approximately 10 000 taxonomic records and 2000 bibliographic records, with 60 000 hypertext links between them. This data was then structured into a relational database format and loaded into Oracle. The CGI scripts retrieved this data and presented it to the user as HTML pages for display in a web browser. Appearance was managed with a series of HTML templates that were merged with the database content.
+
DINOFLAJ2 is based on the original version of DINOFLAJ. The original DINOFLAJ was implemented as a series of custom perl CGI scripts with an Oracle database back-end. Content was produced by parsing the original taxonomic documents into approximately 10000 taxonomic records and 2000 bibliographic records, with 60000 hypertext links between them. This data was then structured into a relational database format and loaded into Oracle. The CGI scripts retrieved this data and presented it to the user as HTML pages for display in a web browser. Appearance was managed with a series of HTML templates that were merged with the database content.
 
+
 
+
  
 
Major hurdles for the system, or any system designed to replace it, are:
 
Major hurdles for the system, or any system designed to replace it, are:
  
*the need to handle accented characters, the requirement to format text styles (italics, bold) in an intuitive way;
+
*the need to handle accented characters and the requirement to format text styles (italics, bold) in an intuitive way;
 
*the necessity to have many links within individual fields (links that might also contain styles or accented characters);
 
*the necessity to have many links within individual fields (links that might also contain styles or accented characters);
*and the need to automatically establish some 60 000 hypertext links rather than to establish them manually.
+
*and the need to automatically establish some 60000 hypertext links rather than to establish them manually.
  
The last item is complicated by, for example, homonyms, orthographic variants, abbreviated citations in some contexts (e.g., "Gonyaulacysta" for "Gonyaulacysta jurassica"), and the various ways that references are cited (e.g., author name inside or outside parentheses). Though all these challenges were met to get the initial system running, it was found more difficult to enable in-place editing of the resulting database content. The text itself and its formatting was easily managed, but maintaining the consistency of the links between records at the same time, and making it easy for the user to edit, was difficult. Attempts to achieve a fully functional, robust, and fast editing system were unsuccessful and, at the time (1998), tools to manage this kind of content were non-existent or had significant performance, cost, or other unacceptable limitations.
+
The last item is complicated by, for example, homonyms, orthographic variants, abbreviated citations in some contexts (e.g., "Originally <I>Gonyaulax</I>" in a "sequence" list for the <I>[[Gonyaulacysta jurassica]]</I> entry actually refers to "<I>Gonyaulax jurassica</I> Appendix B"), and the various ways that references are cited (e.g., author name inside or outside parentheses). Though all these challenges were met to get the initial system running, it was found more difficult to enable in-place editing of the resulting database content. The text itself and its formatting was easily managed, but maintaining the consistency of the links between records at the same time was difficult, and it was difficult to make it easy for the user to edit. Attempts to achieve a fully functional, robust, and fast editing system were unsuccessful and, at the time (1998), tools to manage this kind of content were non-existent or had significant performance, cost, or other unacceptable limitations.
  
By 2004, there had been many advancements in the management of this type of data using web interfaces. DINOFLAJ2 is the result. It is implemented with the same front-end document parsing and linking system as the original DINOFLAJ, but the resulting data is managed using a "[http://en.wikipedia.org/wiki/Wiki wiki]", a website system that can be interactively and collaboratively modified with a web browser. We have chosen to use [http://www.mediawiki.org/ MediaWiki], the same system used by the on-line [http://www.wikipedia.org/ Wikipedia], because it supports international character sets, large numbers of entries, arbitrarily-formatted links within the text, it is stress-tested on large datasets, and the source code is available for modification.
+
By 2004, there had been many advances in the management of this type of data using web interfaces. DINOFLAJ2 was the result. It was implemented with the same front-end document parsing and linking system as the original DINOFLAJ, but the resulting data was managed using a "[http://en.wikipedia.org/wiki/Wiki wiki]", a website system that can be interactively and collaboratively modified with a web browser. We chose to use [http://www.mediawiki.org/ MediaWiki], the same system used by the on-line [http://www.wikipedia.org/ Wikipedia], because it supports international character sets, large numbers of entries, arbitrarily-formatted links within the text, it has been stress-tested on large datasets, and the source code is available for modification.
  
 
The system has been adapted by disabling some features and adding others, as follows:
 
The system has been adapted by disabling some features and adding others, as follows:
Line 20: Line 18:
 
#A special "parent" Category link was added. This allows representation of the tree structure of the Linnean classification. For example, the parent of a species is the genus to which it is assigned, its parent is a family or subfamily, and so on. When the parent is displayed, this allows selecting and presenting a list of its "children" (e.g., all the species assigned to a genus);
 
#A special "parent" Category link was added. This allows representation of the tree structure of the Linnean classification. For example, the parent of a species is the genus to which it is assigned, its parent is a family or subfamily, and so on. When the parent is displayed, this allows selecting and presenting a list of its "children" (e.g., all the species assigned to a genus);
  
09/2015: Most of these custom code modifications have now been eliminated by using a newer version of MediaWiki combined with the Semantic MediaWiki extension.  By modifying a few templates and altering the imported data with additional markup, most of the custom functionality of the prior software has been duplicated.
+
As of September 2015, most of these custom code modifications were eliminated by using a newer version of MediaWiki combined with the [https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki Semantic MediaWiki] extension.  By modifying a few templates and altering the imported data with additional markup, most of the custom functionality of the prior software has been duplicated and several new features added.  DINOFLAJ2 content was moved to the newer software and new hardware at the same time these software changes were made.
  
 
MediaWiki depends upon the [http://www.mysql.com/ MySQL database] and [http://www.apache.org/ Apache web server] to store data and serve web pages, respectively.
 
MediaWiki depends upon the [http://www.mysql.com/ MySQL database] and [http://www.apache.org/ Apache web server] to store data and serve web pages, respectively.
 +
 +
DINOFLAJ3 continues to use essentially the same software configuration as the 2015 version of DINOFLAJ2 with minor changes.

Latest revision as of 14:12, 6 January 2017

DINOFLAJ2 is based on the original version of DINOFLAJ. The original DINOFLAJ was implemented as a series of custom perl CGI scripts with an Oracle database back-end. Content was produced by parsing the original taxonomic documents into approximately 10000 taxonomic records and 2000 bibliographic records, with 60000 hypertext links between them. This data was then structured into a relational database format and loaded into Oracle. The CGI scripts retrieved this data and presented it to the user as HTML pages for display in a web browser. Appearance was managed with a series of HTML templates that were merged with the database content.

Major hurdles for the system, or any system designed to replace it, are:

  • the need to handle accented characters and the requirement to format text styles (italics, bold) in an intuitive way;
  • the necessity to have many links within individual fields (links that might also contain styles or accented characters);
  • and the need to automatically establish some 60000 hypertext links rather than to establish them manually.

The last item is complicated by, for example, homonyms, orthographic variants, abbreviated citations in some contexts (e.g., "Originally Gonyaulax" in a "sequence" list for the Gonyaulacysta jurassica entry actually refers to "Gonyaulax jurassica Appendix B"), and the various ways that references are cited (e.g., author name inside or outside parentheses). Though all these challenges were met to get the initial system running, it was found more difficult to enable in-place editing of the resulting database content. The text itself and its formatting was easily managed, but maintaining the consistency of the links between records at the same time was difficult, and it was difficult to make it easy for the user to edit. Attempts to achieve a fully functional, robust, and fast editing system were unsuccessful and, at the time (1998), tools to manage this kind of content were non-existent or had significant performance, cost, or other unacceptable limitations.

By 2004, there had been many advances in the management of this type of data using web interfaces. DINOFLAJ2 was the result. It was implemented with the same front-end document parsing and linking system as the original DINOFLAJ, but the resulting data was managed using a "wiki", a website system that can be interactively and collaboratively modified with a web browser. We chose to use MediaWiki, the same system used by the on-line Wikipedia, because it supports international character sets, large numbers of entries, arbitrarily-formatted links within the text, it has been stress-tested on large datasets, and the source code is available for modification.

The system has been adapted by disabling some features and adding others, as follows:

  1. Modification of the wiki content is controlled only by the editors. Many of the collaborative features of a typical wiki are disabled for reasons of security and editorial consistency;
  2. MediaWiki Categories were defined to represent attributes of taxonomic entries, such as its Linnean rank (e.g., genus, species, family, etc.) and to separate records that are taxonomic from those that are bibliographic. The program code formats entries differently depending on these attributes;
  3. A special alphabetical index page was designed to manage Category lists that might be thousands of entries long (the user is initially presented with an A-Z listing);
  4. A special "parent" Category link was added. This allows representation of the tree structure of the Linnean classification. For example, the parent of a species is the genus to which it is assigned, its parent is a family or subfamily, and so on. When the parent is displayed, this allows selecting and presenting a list of its "children" (e.g., all the species assigned to a genus);

As of September 2015, most of these custom code modifications were eliminated by using a newer version of MediaWiki combined with the Semantic MediaWiki extension. By modifying a few templates and altering the imported data with additional markup, most of the custom functionality of the prior software has been duplicated and several new features added. DINOFLAJ2 content was moved to the newer software and new hardware at the same time these software changes were made.

MediaWiki depends upon the MySQL database and Apache web server to store data and serve web pages, respectively.

DINOFLAJ3 continues to use essentially the same software configuration as the 2015 version of DINOFLAJ2 with minor changes.