How to fill in the IRBAS database templates
2. Contributing data
2b. How to fill in the IRBAS database templates
There are three types of data-entry templates that will help you format your data for compatability with the IRBAS database: Site, Biota and Environment. The type of template you use depends on the nature of the data you wish to insert:
Data-entry templates for download:
- Site: to insert geo-locational data and general summary information about the sampling sites and locations: download Site Template
- Biota: to insert taxonomical inventories of fauna and/or flora along with the sampling methods: download Biota Template
- Environment: physical and chemical data and sampling methods: download Environment Template
Note: these files are all tab-delimited text files. Be careful when opening/saving them as some cells may contain commas.
Each template you fill in must be associated with a sampling campaign and/or time period that you can identify as a particular dataset. The dataset is identified by information (e.g. data provider, dataset name, journal article, etc.) that you enter as 'general metadata' within each template. These general metadata must be the same across any Site, Biota and Environment file that is part of the same dataset. The general metadata links each Site, Biota and/or Environment file to the right dataset.
The top 17 rows of each template are for your 'general metadata', followed by two empty rows, and all subsequent rows are for the actual 'data matrix' (which contains your sampling data and associated metadata).
Mandatory templates and data-fields
Some templates and data-fields are mandatory: they must be completed or your dataset will not get inserted into the database.
The Site template is mandatory: a completed Site template must be inserted with any accompanying Biota and/or Environment template from the same dataset. Certain fields within each template are also mandatory.
In each template, the name of each data-field (known as a ‘Header’) is provided and you fill in your relevant data alongside the name (in the case of the general metadata) or under the name (in the case of the data matrix).
The definitions, units of measurement, format and any other important information about what each header means and how you should fill each field are all available here in handy tabular form:
Tables for download:
- Download the 'General metadata' field definitions and data-entry criteria: download here
- Download the Site template 'data matrix' field definitions and data-entry criteria: download here
- Download the Biota template 'data matrix' field definitions and data-entry criteria: download here
- Download the Environment template 'data matrix' field definitions and data-entry criteria: download here
Note that some fields are mandatory — i.e. you must fill them in — (the headers for mandatory fields are in bold typeface in the tables above) and some become mandatory if you fill in certain other related fields (the headers for these ‘can become mandatory’ fields are in red-coloured typeface in the tables above).
As mentioned above, mandatory fields must be filled in. However, in a few cases, we realise that these fields may not apply to your particular dataset. In these specific cases, you are allowed to leave the mandatory fields blank. Please see the ‘Naming Convention/Comments/Additional Instructions’ status of the field in table below to check where this applies.
The allowable data you enter (regardless of whether mandatory or not) is then determined by each field’s Universe, Unit, Definition and, in some cases, Naming Convention/Comments/Additional Instructions and Examples (see the tables above).
Please follow the directions, and respect the universes and units we outline. This will ensure your data is entered correctly into the database and that users will interpret your data (if made openly available) correctly too. THANK YOU!!
Here is an example of a completed Site template (click on the image to enlargen it):
Examples of downloadable, completed templates for all three types (site, biota and environment) are also available for your reference:
- Leigh_26092014_LeighC_GulfRivers_Site: download
- Leigh_26092014_LeighC_GulfRivers_Biota: download
- Leigh_26092014_LeighC_GulfRivers_Environment: download
Note: these files are all tab-delimited text files. Be careful when opening/saving them as some cells may contain commas.
These example files provide a handy reference and supplement to this guide when filling in your own data in the templates. Please do consult them.
Please note: They contain real data that has been uploaded into the IRBAS database (provided by Dr Catherine Leigh). You are welcome to access and use these data providing you accept the terms and conditions of use of the IRBAS database and give appropriate acknowledgement and citation to the dataset, its source and provider as per the metadata contained in the files.
The top 17 rows of each template before the data matrix will contain your general metadata. These metadata must be exactly the same in all the templates (for site, environment and biota) you use to insert data that come from a single source (= the same dataset).
Fill the metadata fields in accordance with the format and universes given for each field (see the tables above). Sometimes no universe or format is specified for a field (but examples may be given); feel free to fill such fields with the content you want (i.e. alphanumeric content).
You must fill all the mandatory fields in this general metadata section. You can fill the non-mandatory fields, but do not delete the rows of the non-mandatory fields you have not filled in. Here is an example (click on the image to enlargen it):
You also need to follow a naming convention for your dataset so that it can be identified with the metadata and any associated site, biota and environment data you enter, and to provide an identifier that is meaningful to you. This consists of the data provider's last name (with the first letter in capitals) and then initials (in capitals), an underscore, followed by a name that is meaningful (e.g. RedRiverData, GulfRivers, PhD, etc). Spaces are not allowed. For example, the name of the example dataset is dataset LeighC_GulfRivers. If you have more than one dataset that you are providing, please take care to use a unique dataset name for each.
If you have more than one reference (source) for the dataset, enter the relevant information for individual references into consecutive columns. For example, in the ‘author_s’ field, enter the author(s) of the first reference in the first available column (separating individual authors of the same reference using commas) and then use the next available column to enter the author(s) names of the second reference, etc. Then, for example, in the ‘publication_year’ field, enter the year of publication for the each reference within the same column as the relevant author(s) of each reference, i.e. remember to keep the order of references the same for each field.
The cells in the rows and columns below the metadata are for your sampling data and are filled in the form of a data matrix.
In this section of the template, when a field header is mandatory you must not delete it. Mandatory fields in general must be filled, but in a few cases, the mandatory fields can be left blank (unfilled) if they do not apply to your particular dataset. The cases this applies to are indicated in the 'Naming convention/Comments/Additional instructions' description of the field by the phrase “Leave blank if this is same date as the start date BUT DO NOT DELETE THE COLUMN” (see the tables above).
When a non-mandatory field is empty (i.e. you have not filled in any data in the field cells) you are allowed to delete the column as long as you don't change the order of other columns. But we recommend that you don't delete any columns to be on the safe side (to avoid accidental deletion of mandatory columns).
Fill the fields in accordance with the format and universes and units given for each field (see the tables above). If no universe or format is specified for a field header, please use one of the 'Examples' if it is applicable (see tables above), but feel free to fill the field with the content you want (as long as it is alphanumeric content without special characters).
A cautionary note on decimal point formatting
To ensure your data is inserted correctly, please only use ‘points’ (.) for decimal points in your files, not commas (,). The database will not recognise commas as decimal points.
The 'Site' template
The Site file contains all your sampling locations within all your sites (and their mandatory measures) in which biota and environment samples were collected, and can also include infomration on associated gauging stations for flow (i.e. discharge), rainfall and/or temperature data if available. You can have multiple (different) locations per site, e.g. you may have sampled the left bank and the right bank of a site, with each bank being a separate ‘location’. Measures (e.g. discharge) are not considered metadata, but other information you provide (e.g. location latitude and longitude, river names etc.) are considered metadata along with the ‘general metadata’ provided in the top 17 rows of the template.
Each row of the data matrix in the Site template corresponds to a unique location or gauging station.
Use separate rows for sampling location information/data and gauging station information/data.
The sampling location rows should only contain information derived from those sampling locations. If you wish to include gauging stations, you must enter them as separate, new rows. The new gauging station rows should only contain information about those gauging stations (i.e. their geo-locational information and the sampling location each is associated with) and the summary statistics derived from the gauging stations.
Here is an example of a completed Site template with locations and a flow-gauge station (click on the image to enlarge it):
Site template: naming conventions
To ensure each individual site and location entered into the database is unique, you must follow certain naming conventions, regardless of what name or code you personally have given your site and sampling locations. For example:
- Site: Dataset name, underscore, your site name; all with no spaces between characters (e.g. LeighC_GulfRivers_FUM)
- Location: site name, underscore, your location name; all with with no spaces between characters (e.g. LeighC_GulfRivers_FUM_1). The location name cannot be the same as the site name, even if only one location was sampled per site: in such cases, we recommend you use a dummy integer for your location name (e.g. 1 - see above example).
- Countries must be named in a consistent fashion. Only use names listed under the Header 'country' in the tables above.
Site template: sampling location and gauging station coordinates
Coordinates (latitudes and longitudes) of sampling locations and gauging stations must be expressed in the WGS84 datum. You can use Google Earth to find the coordinates of your sampling locations and stations in the WGS84 datum, and you can choose the display format (decimal degrees; degrees, minutes, seconds; or degrees, decimal minutes) in the window that appears after clicking on Tools, Options, 3D view (on version 126.96.36.1991).
The IRBAS database accepts the following display formats (see also the tables above): degrees, minutes, seconds (the default formatting); decimal degrees; or degrees, decimal minutes. You must always indicate the cardinal point (N or S for latitude; E or W for longitude) and you must not use the ‘minus’ sign to indicate S. Decimals must always be given to at least two places. Spaces or * must be used to indicate the degrees, minutes and seconds units - please do not include any special characters because they will not be read into the database correctly (replace them with spaces or *, following the instructions below):
- The allowable formatting for decimal, minutes and seconds is: 00*00*00.00*X (this is the default formatting) or 00 00 00.00X, where you must replace the X with the relevant cardinal point.
- The allowable formatting for decimal degrees is: 00.00*X or 00.00X, where you must replace X with the relevant cardinal point.
- The allowable formatting for degrees, decimal minutes is: 00* 00.00*X or 00 00.00X, where you must replace X with the relevant cardinal point.
We realise these rules are rather annoying but the formatting ensures the coordinates will be entered correctly into the database. Thank you for following the formatting rules!!
Use the ‘lat_location’ and ‘long_location’ fields in the Site template to provide coordinates for your study locations only; use the ‘lat-station’ and ‘long_station’ fields to provide coordinates for gauging station locations (e.g. a flow-, rainfall-, or temperature-station) associated with a sampling location. Use a different, new row to enter the coordinates for each flow-, rainfall- and temperature-station associated with the same sampling location.
Site template: timespan qualifiers
In the Site template, once data is entered into certain fields, other fields become mandatory. This applies to fields that must be qualified with additional information because the data associated with those fields is time dependent e.g. land use or summary statistics for discharge. That is, you must fill in another field (following certain protocols) to qualify the relevant timespan and, for gauging station statistics (i.e. derived from a flow-, rainfall-, or temperature-station), the relevant timestep of the raw data from which the statistics were calculated.
For example, the field ‘discharge_mean_annual’ needs to be qualified according to the time period over which the mean annual discharge applies (e.g. start year to end year) and the time step of the raw discharge data on which the mean is based (15 minute, hourly, daily, monthly, or quarterly).
For a field like land use and land cover (‘LULC’), you only need to supply the time period over which the relevant information was observed or applicable. In most cases, this will be equivalent to the time span of your study (start year to end year).
Fields that need to be qualified with timespan (and timestep) are indicated in the tables above by red-coloured text in the relevant timespan fields (i.e. the header names ending with ‘timespan’). E.g. the ‘discharge_mean_annual’ field has an associated field called ‘discharge_mean_annual_timespan’. If you fill in the ‘discharge_mean_annual’ field, you must also fill in the ‘discharge_mean_annual_timespan’ field.
Site template: zero-flow and dry periods - what is the difference?
In the IRBAS database, “zero flow” refers to zero discharge and does not make a distinction as to whether this describes standing surface water or no surface water: both circumstances are possible under the term “zero flow”. By contrast, “dry” in the database defines the specific case of zero flow when there is also no surface water (i.e. the streambed surface is dry).
For example, if you have discharge data recorded for a site indicating there was zero flow in July-August 2006, enter this into the database under the “zero flow” fields NOT under the “dry” fields UNLESS you also know that all surface water was lost from the site during the entire zero-flow period, in which case you would enter the relevant data in both the “zero-flow” and “dry” fields. However, if discharge ceased in July but surface water was not lost at the site until August, you would enter the relevant “zero-flow” data for the July-August period, but the relevant “dry” data for August only.
Biota and Environment templates
The Biota and Environment templates contain the sampling methods and biological and environmental data associated with samples collected at each location indicated in the Site template. Measures in these templates (e.g. number of taxa collected; total nitrogen concentration) are not considered metadata, but other information you provide (e.g. sampling procedures) are considered metadata along with the ‘general metadata’ provided in the top 17 rows of the templates.
Biota and environment templates: sampling strategies and protocols
In both the Biota and Environment templates there are several columns in the data-matrix component that are used to describe your sampling procedures. These columns are all mandatory and come before the actual data columns (i.e. sample measurements). In both the Biota and Environment templates, some of these sampling procedures columns can be left blank if they do not apply to your particular dataset (consult the ‘Naming Conventions/Comments/Instructions’ descriptions of these fields in the tables above for details) but please DO NOT DELETE any mandatory column (even if it is empty).
Biota and environment templates: naming conventions
Sample names must follow a naming convention to identify them as unique samples within the database, regardless of the sample name you personally use to identify individual samples.
Biota sample: location_name, type of template ('biota' = flora or fauna), underscore, type of biota (AI = aquatic invertebrates; TI = terrestrial invertebrates, TV = terrestrial vertebrates; NFAV = non-fish aquatic vertebrates; PP = phytoplankton; BA = benthic algae; B = bacteria; AMP = aquatic macrophytes; TP = terrestrial plants; other), underscore, date of sample in generic code, underscore, sample identifier (replicate number or letter, e.g. 1, 2, 3, a, b, c) all with no spaces between characters. For example: LeighC_GulfRivers_FUM_1_biota_AI_38585_1
Environment sample: location_name, underscore, type of template (env = environment), underscore, type of sample (dimension = waterbody dimension; cover = cover descriptor; physchem = physical and chemical), underscore, date of sample in generic code, underscore, sample identifier (replicate number or letter, e.g. 1, 2, 3, a, b, c) all with no spaces between characters. For example: LeighC_GulfRivers_FUM_1_env_cover_38585_1
Biota template: taxonomy and abundance data
In the Biota template, the abundance (or biomass, density, presence-absence) data for each taxon found and identified in your sample is entered in columns to the right of all the columns that describe sampling methods. You do not need to indicate the taxonomic resolution of identification, as this will be automated via the database (it will recognise phyla, class, order, family, genera and species names as such). But you will need to enter the header names (i.e the taxa names) as new columns in the templates yourself, and then enter the relevant abundance data under each header as relevant to each sample (row): essentially, this means you add a taxa-by-samples matrix of data to the right of all the other columns already in the template.
Also, if you have data for different types of biota (e.g. fish and aquatic invertebrates; aquatic and terrestrial invertebrates) and/or you used different methods to collect these biota, you need to enter them on separate lines so that the biota types and sampling methods can be identified with the relevant abundance data.
If you have controlled-count data (i.e. the abundance reflects the count up to a maximum predefined number of individuals per sample) indicate this for a sample as “controlled count$integer”, replacing "integer" with the relevant predefined number of individuals (e.g. 200).
Here is an example of a completed Biota template (two images from the one template - click on them to enlargen):
Biota template: abundance data
In the IRBAS database, zeros indicate known absences only. Please do not enter zeros to fill in empty cells: this slows down the database and the data are not meaningful. Only enter zeros for known absences.
If you have density or volumetric data, ensure the data you provide are in the required units: number of individuals per square meter or per cubic meter, respectively. For example, if you have counted the number of individuals within the area covered by a Surber or Hess sampler (e.g. 0.25 m2) and you indicate that the data are “density” in the ‘type_of_abundance_data’ field, then you must multiply this raw count data by 4. If you want to avoid having to do these conversions, write “count” instead of “density” in the ‘type_of_abundance_data’ field and then enter the raw count data as is.
Biota template: naming conventions for taxa
You need to create the headers for your columns of abundance data (i.e. the taxon names) yourself. These are essentially the scientific names of the taxa for which you have some sort of 'abundance' data (counts, densities, biomasses, presence/absences etc.).
In cases of unnamed/unclassified identifications and in terms of levels of taxonomic resolution (e.g. you have a taxon or taxa identified and counted as Allodessus sp. or Allodessus spp.) you must only use the known and accepted name. That is, for the above example, enter the taxon at the generic level of resolution using the name Allodessus only (delete any sp. or spp. terms in your dataset), followed by an optional life stage qualifier if known (see below).
Qualifying the life stages of taxa
If you have information on the life stage of any insect or fish taxon identified in your dataset, you can add this information to the database using the special character $ placed at the end of the taxon name followed by the relevant life stage. If you have more than one life stage identified for the same insect taxon, you will need to enter each in separate columns.
- Use the term “larva” for all non-adult, non-pupal stages of insects (i.e. larva or nymphs) or nauplii of Copepoda.
- Use “pupa” for pupal stages of insects.
- Use “adult” for adult stages of insects.
- Use “YOY” for young-of-the-year fishes.
For example, say you counted separately the number of individuals of larval and adult forms of the dydtiscid genus Allodessus. Insert one column to insert the count data for Allodessus$larva and one column for the Allodessus$adult data. You also counted all individuals of the dysticid genus Tiporus, and you know that they were all adults. If you wish, you can call this column Tiporus$adult. Even though it is not mandatory to indicate the life stage, the extra information will make the database more useful. We recommend you add life stage information into your templates whenever you have it available.
Environment template: environmental data
These data are grouped into three types: those that describe
- waterbody dimensions (e.g. width and depth of the stream sampling location)
- physicochemical data (e.g. nutrient concentrations, water quality data)
- cover descriptors (e.g. percentage cover of different substrate types on the streambed).
These types of data are associated with different sampling methods and need to be entered as separate lines in the Environment template.
Here is an example of a completed Enviornment template (two images from the one template - click on them to enlargen):
Environment template: nutrient and other chemical data
If you enter nutrient or other chemical data (e.g. nitrogen, phosphorus or carbon concentrations in the water column or sediment; even if it is below the detection limit), then you also need to enter the lower detection limit applicable to each nutrient species. Nutrient concentrations that are below detection limit must be entered as BDL (not, for example, < 0.005).
I've filled in my templates and now I want to save and then insert them into the database