COPEPODITE (data format guide)

The Time-Series Toolkit

An online plankton time-series analysis and visualization toolkit.

How to make a COPEPODITE-readable CSV data file

Data Format Instructions:
Click on any topic below above to jump to that specific section, or scroll down this web page to view them in order.

General Data Layout and Format Concept

Example of Format (table)

About DATE (sampling date) Columns

About DATA VARIABLE (abundance, biomass, temperature, nutrients)

Special Grouping Options (taxa groups, nutrient groups)

FAQ and Known Problems

General Data Layout and Format Concept:

COPEPODITE can only read your uploaded time-series data if it is in a simple,keyword-labeled, comma-delimited ("comma separated values") CSV format. This format can be generated by Excel and many other database/statistical/spreadsheet programs, but will confuse COPEPODITE if you also use commas in your column headers or as a decimal place in the numbers (e.g., the value of "1/4" should be represented as "0.25" and not "0,25").

The COPEPODITE data format consists of a single row of "column headers", the pink-colored first row in the table shown below, followed by multiple rows of date and/or data values. In this column header row, the bold text are recognized Date or Variable Indicators.

Format Rule #1: The very first row must contain the column labels (e.g., DATE-YMD, ABND=, BIOM=, TEMP=, ...).

Format Rule #2: The very first column must be a date column (e.g., DATE-YMD, DATE-MDY, or DATE-DMY).

Formatting Hints, Tips, and Tricks:

It is okay to leave an empty data value cells in the spreadsheet (e.g., like the yellow boxes below). These do not need to be filled in with anything, and will be ignored. Empty columns are assumed to mean "no data / not sampled". Any zero value (indicating "absent") must be specified with a "0" value.

If you place a "#" character in the very first column of a row, the software will ignore that entire row. (See the "# Comments" and "#2005-11-05" light blue column below.) In these cases, the entire row is completely ignored by COPEPODITE. Note that this could cause other columns of data found in that row to also be ignored. If you need to "turn off" a value in the first few columns, place the "#" in the value cell itself and NOT the very first date column.

If you place a "#" as the first character in data value cell, it will ignore just that one individual measurement (i.e. the light green "# lost sample" and "# -100.25 (??)" examples below).

The purpose of the "#" ignore option is to allow you to quickly "remove/restore" anomalous values from a data file without permanently removing them. If you remove the value instead, it will be harder to restore it in the future. This on/off toggle lets you play around with large/small values to see how they influence the analysis results.

... ... ... Data Format instructions continue after the table below ... ...

[ go back to the top ]

DATE-YMD	BIOM= Total Wet Weight (mg/m3)	DATE-MDY	ABND1= Calanus finmarchicus adults (#/m3)	ABND1= Calanus finmarchicus C3-C5 (#/m3)	DATE-DMY	TEMP= Temperature (C) at 50 meters
# Comments	Because the very first character on this line is "#", the ENTIRE row will be ignored by COPEPODITE. . You can use this row for comments and notes.
2005-10-27	532	Oct/26/2005	105	352	21_10_2005	19
2005-10-28	400	Oct/27/2005		400	25_10_2005	21
2005-10-29		Oct/28/2005	217	183	26_10_2005	20
2005-11-01	319	Oct/29/2005	50	117	03_11_2005	17
#2005-11-05 Do not do this!	entire row ignored 1010	Oct/29/2005	entire row ignored 173	entire row ignored 532	05_11_2005	entire row ignored 19
2005-11-07	# lost sample	Oct/29/2005	173	# -100.25 (??)	05_11_2005	19
2005-11-17	971	Nov/03/2005	140	817	07_11_2005	18

[ go back to the top ]

DATE columns and Date Formats:

The current version of COPEPODITE has base time unit is "monthly means". This means that if you provide weekly or daily data, it will be binned and averaged into a single monthly value for that month of that year.

It is quite common to have different sampling intervals (dates) for the plankton than for the other variables. You might have monthly or seasonal zooplankton date, weekly chlorophyll data, and perhaps daily temperature data. COPEPODITE will read all of these and correctly match up and synchronzie the data for you (currently to monthly bins).

In the example table above, the biomass data and abundance data and temperature data all have different sampling dates (and formats). The COPEPODITE software will bin them into monthly means and then synchronize them into matching month+year date sets. (This means the temperature data, although taken on different days, will sync with the corresponding biomass and abundance data for that month!).

Format Rule #3: Each column of data values is uses the date column found to its left. You can have a single date column with 15 data columns following it, or you can have 15 paired "date + data" column pairs, or any mixture you desire. (These date columns can also have different date formats.) The general idea is to make your set up of the data file as easy as possible.

Format Rule #4: COPEPODITE currently only recognizes three general data formats:

Keyword	Order	Examples
DATE-YMD	Year + Month + day	2010-Mar-15 2010/03/15 2010_03_15 2010.Mar.15
DATE-DMY	Day + Month + Year	15-Mar-2010 15/03/2010 15_03_2010 15.Mar.2010
DATE-MDY	Month + Day + Year	Mar-15-2010 03/15/2010 03_15_2010 Mar.15.2010

The date-component (e.g, year or month or day) delimiter can be any of the following: "-", "_", ".", "/".

Three character (English) months are also recognized (e.g., "Jan", "Feb", "Mar", "Apr", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), with any capitalization ignored (e.g., "JAN" = "jan" = "Jan" = "JaN" = "jAn" ). The system currently does not recognize non-English text (yet).

Your date values must include a month and day for each data value. If you have monthly data, with no day, use a day value of "15" (i.e. "15-mar-2010"). If you have annual or "once per year" data, you can use June 15th (2015_06_15) as the month and day for each year. For "once per season" data, select a month that best represents the season or sampling period.

Format Rule #5: Your date format must remain consistent within each individual date column (ie, do not switch from YMD to MDY or DMY within a single date column). You may have multiple different date format COLUMNS in the spreadsheet (like the table above).

One of the biggest problems with Excel is when you open a CSV data file and it tries to automatically the reformat the dates in a date column.

For example, if may load "Year-Month-Day" data and change it to "Month-Day-Year" format. (COPEPODITE will give you a date error you save the CSV file with this change but do not also change the "DATE-YMD" column header to "DATE-MDY".

In some really annoying cases, Excel will partially convert only half of the dates. For example, "05/12/2017" can be interpretted (by Excel) as 12-May-2017 or 05-December-2017. One way to avoid this problem is to use the "_" character in your date values (e.g., "05_12_2017" will not be auto-changed by Excel).

How to make an "Excel Safe Date": If your data value is in cell "A5", the formula below will grab each separate date element and combine them with a "_" to create a YEAR_MONTH_DAY format. You can later reverse this by simply selecting the entire date column and doing a replace on all the "_" with a "/" character.

=year(A5)&"_"&month(A5)&"_"&day(A5)

[ go back to the top ]

DATA (variable) columns Types:

All COPEPODITE data columns must be labeled with a recognized variable indicator (e.g., "BIOM=", "ABND=", "ABND5=", ...).
The text to the right of the "=" symbol can be whatever you desire, and that text will be used to describe the variable in column (e.g., "BIOM= Total Wet Mass (mg/m3)", "BIOM= Total Dry Mass (mg/m3)"). This text will also be used in all of the plots and figures showing the variable (e.g., "TCOP" or "Total Copepods (#/m3)" will be shown in the plot titles.

Please make that your variable text does not contain any commas (e.g., Do not use "ABND= Calanus finmarchicus, adults, , female"). These extra commas will corrupt the comma-separated-values data format and cause your data to fail the preview step. Also, at this time please do not include any Greek or mathematical symbols (e.g., the degrees symbol or the "u" ("mu" or micro) symbol), as they will cause the header text to get cut short (the rest of the description text will not be shown in the plots). (This latter issue is a problem with the web server not liking non-standard-ASCII characters.)

Format Rule #6: COPEPODITE currently only recognizes these Variable Indicators:

BIOM=	Use this for total [zooplankton] biomass values (ignoring zero values). If you have in individual taxa biomass values, and want to include zeros in the processing, used the "ABDT=" label.
ABDT=	Use this for zooplankton or phytoplankton abundance or individual biomass values, and INCLUDE any zero values in the calculations. Note the empty spreadsheet cells are not treated as a "zero" value.
ABND=	Use this for zooplankton or phytoplankton abundance or individual biomass values, and ignore any zero values.
CHLA=	Use this for chlorophyll (or pigment) values.
TEMP=	Use this for temperature values.
PSAL=	Use this for salinity values.
LOTH=	Use this for NUTRIENTS and other miscellaneous values. (If using a log10 analysis, these values will go through the log10 processing method.)
OTHR=	Use this for Oxygen and other miscellaneous values. (If using a log10 analysis, these values will NOT be log10 transformed.)

Format Rule #7: Your data values can NOT already be "log transformed". (If they are, reverse the values back to non-transformed.) The Toolkit will log-transform (or not) depending on value type and/or the analysis you have selected. The WGZE/WG125/WGPME plots use log-transform biological and nutrient values, while the "PSR-2018" plots do not.

[ go back to the top ]

Special Grouping Options:

Currently the COPEPODITE "group plot" is focused on allowing a user to plot multiple years of taxa (ABND=) or nutrient variables (LOTH=) together in a single plot. This is done by identifying the group membership with a one digit number (e.g., the "ABND1=" column header in the example table earlier in this document indicates "Group 1"). Any taxa data associated with "ABND1=" will be plotted in a separate graph from "ABND2=" (e.g., so you could plot "copepods species" using ABND1= and "diatom species" using ABND2= ). Currently, different variable types do not cross-group (e.g., "ABND1=" and "ABDT1=" and "BIOM1=" and "TEMP1=" have the same number but are actually treated and plotted as completely different groups.)

For biological and nutrient values, the Group Plots are shown in both "log10" and "raw" value format (see examples below), because the results and usefulness of either plot depends on the data, the distribution of values, and what you are looking for within the data. Below are examples of the same zooplaknton data plotted in both "log10" and "raw" format.

This "raw data" plot lets you compare relative abundance between the taxa. Notice the large ~1985 abundance peaks in the three dominant taxa. This plots does not let you see the two least dominant taxa very well (they are almost flat and near the zero line). A log10 plot (to the right) may handle variables with large numerical differences and/or near-zero values.

Raw (non-transformed) Values

linear grouping example

linear grouping example

This "log10 transformed" plot lets you compare all of the taxa even though their value ranges are quiet different. This type of plot is also useful in nutrients where the variables also have large value ranges (e.g., silicate approaching 100 while other nutrients staying less than 5). Note that this plot does not highlight the ~1985 peak seen to the left. Log-transforming can be good for noisy data, but it can also hide important events.

Log10 Transformed Values

log10 grouping example

log10 grouping example

[ go back to the top ]