Reinventing the Wheel or Prudently Hitching a Ride?

[Note: Copyright 1997 by Gail E. Kampmeier and Michael E. Irwin. This paper was submitted for the Entomological Collections Network abstracts of presentations made at the 1996 meeting in Louisville, KY, December 1996. Although the references to the condition of the therevid databases were current at the time of writing, the Mandala database system has evolved significantly since then. Please see request a demo from Gail Kampmeier (gkamp@uiuc.edu) or see the Mandala website.]

“Rolling our own” database design using an off-the shelf database engine such as Claris’ FileMaker Pro(tm), was not our original intention when we applied for a National Science Foundation PEET (Partnerships for Enhancing Expertise in Taxonomy) grant in early 1995. We were beta-testing Rob Colwell’s Biota, which showed promise, but the released version that might have allowed us the flexibility we needed, came out too late: we were already off and running.

The illusion that YOU can’t ever design a database for your data, that you need to rely on a professional programmer, or a dedicated management system that may cost big $$$, scares many biologists into unnecessary procrastination and inactivity. Databases for your data don’t need to be complex, even as complex as our ever-evolving databases (which started off relatively simply, by the way). They just need to be able to allow you to input your data easily AND get it out again for your own analyses and interpretations, or when this greater-than-sliced-bread OTHER system comes along, or there is a move afoot to pull everybody’s data together into one megadatabase system.

For us, FileMaker Pro presented such a solution. Although most well-known by Macintosh users, the database engine has been cross-platform for two versions (2.1 (quasi-relational) and 3.0 (relational)), and was recently cited as the second most popular database on the PC. It is most famous for its easy-to-use interface, and the ability of ordinary people (not just techno-geeks) to quickly and easily set up a framework for their data and begin the real task of entering the data (more on features of FMP).

This is not to say that FileMaker Pro does not have its complexities, or that you will remain ultimately satisfied with the first way you have organized your data (or even the second or third). But with some forethought and research into the kinds of information you should be documenting, you should always be able to take your data and reorganize it, add fields, change the way it is presented, or ship it to another platform, application, or database. Some of the important questions to ask are

  •  what kind of data do I have? what makes it unique? Is it a specimen? taxon? collecting event? all of the above?
  • what kinds of outputs (queries) will I want to make of my data? Does my method of organization make it reasonable to search for the kinds of knowledge I hope to gain by using a database to organize my data in the first place?
  • what are the necessary fields (categories) for my data? Can I safely combine pieces of information into one field (e.g., “location”, rather than break it into umpteen smaller fields for “country”, “state/province”, “county”, … and “microsite”) that I know I’ll never want to see in any other combination (the answer is NO!: you need to break up the locality information into separate fields; you can always write a calculation that puts them together again)?
  • who will be using this database? Do you assume that whomever uses the database will have the same competency in using it as you? Or do you make its operation as explicit and as foolproof as possible? Where is the line between utility and beauty and how much time should you spend going beyond utility?

Capturing Insect Specimen Data: The Case of the Therevidae (Diptera)

Our therevid databases center around the management of specimen-based information. Each specimen is given a unique number with a 3-letter prefix. If specimens already have barcodes or other unique numbers attached, these numbers are used, with an appropriate 3-letter prefix. The label information accompanying each specimen is then entered into a series of related databases for

  • label information as it appears with the specimen.
  • lots, defined as collecting event, including locality of collection (political divisions from country to smallest political unit; named geographic features; elevation; longitude and latitude), collectors, date of collection, and method;
  • taxon name and authority
  • determination history (determiner, year, determined as);
  • loan history (contrary to many museum-based databases, this is not the emphasis of our management system);
  • tracking of illustrations made;
  • atmospheric and substrate conditions at the time of collection
    and
  • any accompanying biological or ecological information that may be included about the specimen on the labels, including associations with other specimens.

In addition, information is recorded about the

  • sex (male, female, unknown)
  • type (e.g., holotype, paratype, specimen, etc.)
  • condition of the specimen (is it missing body parts?)
  • dissections (what parts have been dissected?)
  • preservation method (pinned? pointed? 70% ethanol? etc.)
  • if the specimen were used in molecular studies, how was it preserved? is there another tracking number given by the molecular biology lab
  • GenBank number?
  • stage collected
  • stage(s) in collection
  • pupation and emergence dates for reared specimens

All of the databases

  • have context sensitive (field specific), database specific, and general help developed using ClickWare’s ClickHelp(tm). Expectations of form and content of each field and actions of buttons are detailed for the user.
  • feature electronic tracking of questions and problems with a specialized database that allows users to input unanswered questions and others to respond with answers and track when the problem has been resolved. This eliminates the pieces of paper that accumulate and suddenly disappear into a black hole somewhere before they are answered. It also allows tracking of the types of questions asked, sometimes leading to better database design, and enables the user to see if certain kinds of questions have been dealt with in the past.

Curious about…

  • how many specimens were trapped during a collecting event? Note that if specimens have not been named, this field is blank in the list
  • where specimens of a particular taxon have been collected?

View this information via “portals” or windows that allow you to see data from a related database without the information being stored again in that database (conserving the size of various files and thus influencing the speed of operations once files become large).

Needs

  • to aid in geocoding of collecting locations, worldwide gazetteers with major and minor geographical features, latitude, longitude, and elevation information are essential. Information on named features in the United States are readily available via the WWW, but world locations are difficult to impossible to find. Affordable CD-ROMs that are cross-platform would be ideal.

Data input began in July 1995, with undergraduate students processing specimens from collections from around the world. Information on over 34,000 specimens have been input into the databases.

The databases were presented four times in 1996 at various meetings. Feedback from users and discussions at meetings where they have been presented have helped the databases evolve to the form seen today.