|COPEPOD (main) > Products & Atlases > COPEPOD-2007 (tech memo)|
The Coastal & Oceanic Plankton Ecology, Production & Observation Database (COPEPOD) is an online, global-coverage database of zooplankton abundance, phytoplankton abundance, and zooplankton biomass data. Based on over ten years of plankton data management and database development, the COPEPOD project focuses on providing a plankton-tailored data access interface, integrated plankton data products, and clear acknowledgment of the original plankton investigators. While new COPEPOD data are available online each month, this document summarizes the content of the database as of December 2007. This document also introduces an advanced technique for quality control and range checking of plankton data, now in use with all COPEPOD data content.
One of the biggest challenges to building a database of plankton sampled with a wide variety of sampling methods and sampling gear is checking the quality of the data. One approach is to do an intensive review of the entire database every few years and then release it after that review is completed (e.g., World Ocean Database 1994, 1998, 2001, 2005). While this approach is thorough, it is time intensive, must be repeated "from scratch" for each new review, and it greatly slows the frequency of new data release. With its monthly data release schedule, this approach was not a possibility for COPEPOD. What was needed was a (value) range checking system that could quickly compare new plankton data to the thousands of other plankton data already present in the main database. After that automated review, the new data could be immediately released and the new data values could be added to the range checking "data pool" to improve future ranging checks. Using this technique, the ranging data pool is constantly improving as new data are added to the collection, so one would benefit from periodically re-checking the older data sets to incorporate the improved ranging. This automated system is now used in COPEPOD, with advanced ranging flags present in all COPEPOD-2007 data collections.
In terms of plankton data, the main purpose of "quality control" is to check for errors in the database incorporation process versus quality control of the original data. In general, plankton data are usually "correct" in their original source media and any anomalous values found in these data are due to natural processes (e.g., blooms, swarms, patchiness) or mechanical sampling issues (e.g., gear failure or clogged nets). The original authors often annotated these mechanical or bloom events within the original data documentation or data tables, but these annotations may not have been passed along when the data were later digitized and/or added to other databases. The process of putting these data into the database itself is typically the biggest reason for errors in the data, ranging from metadata mis-translation (e.g., in the foreign document, did the author mean "millimeters" or "micrometers" with the label "mm"), mislabeling of data types (e.g., the data tables say "per m3", but the documentation says "per 1000 m3"), and a variety of numeric uncertainties (e.g., "Is the comma in "1,234" a thousands indicator or a decimal indicator?"). In each of these examples, the value ranging question is not "Is this value 5.6 or 5.7?" but rather "Is this value 5.6 or 5600?". These large differences are fairly easy to detect with automated ranging checks if the system is correctly comparing equivalent data types (e.g., "comparing apples to apples, and oranges to oranges").
Figure 1: An example of a COPEPOD-2007 QC-Visualization plot, showing North Pacific zooplankton biomass data.
The entire COPEPOD-2007 publication is available for download here ...