This document describes how to use the aquamapsdata R data package to access curated data through a static database assembled from data sourced from https://aquamaps.org

Immediately after installing the package, a run-once action is needed, in order to download and locally create the SQLite database containing all the AquaMaps data. A small minified limited (< 1MB) variant of the database is included in the package to simplify package development, and can be activated by using default_db("extdata").

Approximately 10G disk space is needed locally when remotely downloading the database file. The download is around 2G compressed and therefore a speedy Internet connection is recommended for this initial step.

# install aquamapsdata from GitHub using devtools

install.packages("devtools") 
library("devtools")
install_gitub("raquamaps/aquamapsdata", dependencies = TRUE)

# initial run-once step required to install remote db locally

library(aquamapsdata)
download_db(force = TRUE)
default_db("sqlite")

Once the database is available locally, it can be queried using a couple of different functions.

Please remember to begin your session by activating the connection to the downloaded database:

Examples of usage

This package provides data that can be queried with tidyverse tools such as dplyr.

It also requires some spatial tools (sp and raster) to be installed.

library(aquamapsdata)
library(dplyr)

# This vignette is built using a minified offline db bundled into the package,
# so vignettes can be built in the cloud without requiring a 
# full install and download of the database (time saver)

invisible(default_db("extdata"))

# NB: normally download_db() would be used first, followed by 
# default_db("sqlite")

Taxonomy can be searched and queried using fuzzy and exact name searches, returning keys with the internal identifiers used in the database.

Those keys could be said to represent species lists that can be used to retrieve other information, such as environmental envelopes etc.

# fuzzy search allows full text search operators AND, OR, NOT and +
# see https://www.sqlitetutorial.net/sqlite-full-text-search/
am_search_fuzzy(search_term = "trevally") %>% pull(key)
## [1] "Fis-29757"
# exact search without parameters returns all results
nrow(am_search_exact())
## [1] 1
# exact search giving NULL params shows examples of existing values
# here we see what combinations are present in the dataset for 
# angling, diving, dangerous, highseas, deepwater organisms
am_search_exact(
  angling = NULL, diving = NULL, dangerous = NULL, 
  deepwater = NULL, highseas = NULL, m_invertebrates = NULL)
## # A tibble: 1 x 6
##   deepwater angling diving dangerous m_invertebrates highseas
##       <int>   <int>  <int>     <int>           <int>    <int>
## 1         0       1      1         0               0        0
# exact search without NULL params, specifying values
hits <- 
  am_search_exact(angling = 1, diving = 1, dangerous = 0)

# display results
display <- 
  hits %>% mutate(binomen = paste(Genus, Species)) %>%
  select(SpeciesID, binomen, SpecCode, FBname)

knitr::kable(display)
SpeciesID binomen SpecCode FBname
Fis-29757 Caranx bucculentus 1896 Bluespotted trevally

Species maps

With a species identifier, probability of occurrence within the known native distribution for a species can either be retrieved in raster format or be displayed on a map .

Here we display the computer-generated native map for the Bluespotted trevally:

library(leaflet)
library(raster)
library(aquamapsdata)

# get the identifier for the species
key <- am_search_fuzzy("bluespotted")$key
ras <- am_raster(key)

# show the native habitat map
am_map_leaflet(ras, title = "Bluespotted trevally") %>%
  leaflet::fitBounds(lng1 = 100, lat1 = -46, lng2 = 172, lat2 = -2)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
## +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
## +wktext +no_defs +type=crs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps WGS 84 in Proj4 definition: +proj=merc +a=6378137
## +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null
## +wktext +no_defs +type=crs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum World Geodetic System 1984 in Proj4 definition

Source: Kaschner, K., K. Kesner-Reyes, C. Garilao, J. Segschneider, J. Rius-Barile, T. Rees, and R. Froese. 2019. AquaMaps: Predicted range maps for aquatic species. World wide web electronic publication, www.aquamaps.org, Version 10/2019.

Content from AquaMaps as provided in this R package is licensed under a Creative Commons Attribution-NonCommercial 3.0 UnportedLicense:

Including attribution

Attribution with citation information and copyright disclaimer should be included. There is a function called am_citation() which provides this information in “text” or “md” format (suitable for use inside Rmarkdown documents).

# include a citation in text format

am_citation()
## [1] "Kaschner, K., K. Kesner-Reyes, C. Garilao, J. Segschneider, J. Rius-Barile, T. Rees, and R. Froese. 2019. AquaMaps: Predicted range maps for aquatic species. World wide web electronic publication, www.aquamaps.org, Version 10/2019.\nContent from AquaMaps as provided in this R package is licensed under a Creative Commons Attribution-NonCommercial 3.0 UnportedLicense, please see http://creativecommons.org/licenses/by-nc/3.0/"

Biodiversity Maps

We can also display a map using several identifiers, for example those associated with the genus “Caranx”, and should then provide an aggregation function such as “count”.

keys <- am_search_exact(Genus = "Caranx")$SpeciesID

ras <- am_raster(keys, fun = "count")

am_map_leaflet(ras, title = "Caranx") %>%
  leaflet::fitBounds(lng1 = 100, lat1 = -46, lng2 = 172, lat2 = -2)

am_citation("md")

For biodiversity maps within a specific bounding box, see the function am_csc_from_extent, which provides biodiversity map data for all species, optionally filtered by a user-defined probability threshold (where 0.5 is the default used at aquamaps.org, see the help for functions am_species_in_csc and am_species_per_csc).

Other usage examples

A few examples follow below and describe how the data in the database can be queried using the provided functions in the package.

Locations in cells and the half degree cell “authority file” table

Likely occurrences for a species is provided for locations in a half degree cell grid. Individual cells have characteristics associated, in a Half degree Cell Authority File, with data available through the am_hcaf() function.

Information about this table is available in the help with fields explained in the am_meta dataset.

A subset of HCAF records can be retrieved based on a certain criteria or field in that table.

am_hcaf() %>% head(1) %>% collect() %>% names()
##  [1] "ID"               "CsquareCode"      "LOICZID"          "NLimit"          
##  [5] "Slimit"           "WLimit"           "ELimit"           "CenterLat"       
##  [9] "CenterLong"       "CellArea"         "OceanArea"        "PWater"          
## [13] "ClimZoneCode"     "FAOAreaM"         "FAOAreaIn"        "CountryMain"     
## [17] "CountrySecond"    "CountryThird"     "CountrySubMain"   "CountrySubSecond"
## [21] "CountrySubThird"  "EEZ"              "LME"              "LMEBorder"       
## [25] "MEOW"             "OceanBasin"       "IslandsNo"        "Area0_20"        
## [29] "Area20_40"        "Area40_60"        "Area60_80"        "Area80_100"      
## [33] "AreaBelow100"     "ElevationMin"     "ElevationMax"     "ElevationMean"   
## [37] "ElevationSD"      "DepthMin"         "DepthMax"         "DepthMean"       
## [41] "DepthSD"          "SSTAnMean"        "SBTAnMean"        "SalinityMean"    
## [45] "SalinityBMean"    "PrimProdMean"     "IceConAnn"        "OxyMean"         
## [49] "OxyBMean"         "LandDist"         "Shelf"            "Slope"           
## [53] "Abyssal"          "TidalRange"       "Coral"            "Estuary"         
## [57] "Seamount"         "MPA"
# compute depth across all cells
am_hcaf() %>% 
  summarize(depth = mean(DepthMean, na.rm = TRUE)) %>% 
  collect() %>% 
  pull(depth)
## [1] 438.2737
# cells with a depth value larger than 4000
deepwater <- 
  am_hcaf() %>% filter(DepthMean > 4000) %>% pull(CsquareCode)

# some of the on average deepest locations
deepwater
## [1] "3017:488:1" "3110:205:4"

The cell location identifier CsquareCode can be used to look up what species are likely occuring there.

# species likely to occur in deepwater location(s)
deepwater_species <- am_species_in_csc(deepwater, min_prob = 0.5)
deepwater_species
## # A tibble: 1 x 2
##   SpeciesID     n
##   <chr>     <int>
## 1 Fis-29757     1
key <- deepwater_species$SpeciesID
am_search_exact(SpeciesID = key)
## # A tibble: 1 x 25
##   SpeciesID SpecCode Genus Species FBname OccurRecs OccurCells StockDefs Kingdom
##   <chr>        <int> <chr> <chr>   <chr>      <int>      <int> <chr>     <chr>  
## 1 Fis-29757     1896 Cara… buccul… Blues…       314        295 Southwes… Animal…
## # … with 16 more variables: Phylum <chr>, Class <chr>, Order <chr>,
## #   Family <chr>, deepwater <int>, angling <int>, diving <int>,
## #   dangerous <int>, m_invertebrates <int>, highseas <int>, invasive <int>,
## #   resilience <chr>, iucn_id <int>, iucn_code <chr>, iucn_version <chr>,
## #   provider <chr>

Species preferences or environmental envelope

The am_hspen() function provides the input data used to generate a species’environmental envelopes and the envelopes themselves. Information about this table is available in the help with fields explained in the am_meta dataset.

HSPEN data can be queried for example based on taxonomy for a single species or higher taxa associated with that species, or any other relevant species identifiers.

# use one or more keys for species
key <- am_species_in_csc(deepwater, min_prob = 0.5)$SpeciesID
am_hspen() %>% filter(SpeciesID == key) %>% head(1) %>% collapse%>% glimpse()
## Rows: ??
## Columns: 56
## Database: sqlite 3.34.1 [/Users/runner/work/_temp/Library/aquamapsdata/extdata/am.db]
## $ SpeciesID       <chr> "Fis-29757"
## $ Speccode        <int> 1896
## $ LifeStage       <chr> "adults"
## $ FAOAreas        <chr> "57, 61, 71"
## $ FAOComplete     <int> NA
## $ NMostLat        <dbl> -7
## $ SMostLat        <dbl> -23
## $ WMostLong       <dbl> NA
## $ EMostLong       <dbl> NA
## $ DepthYN         <int> 1
## $ DepthMin        <int> 7
## $ DepthPrefMin    <int> 12
## $ DepthPrefMax    <int> 36
## $ DepthMax        <int> 63
## $ MeanDepth       <int> 0
## $ Pelagic         <int> 0
## $ TempYN          <int> 1
## $ TempMin         <dbl> 23.97
## $ TempPrefMin     <dbl> 25.84
## $ TempPrefMax     <dbl> 28.45
## $ TempMax         <dbl> 32.65
## $ SalinityYN      <int> 1
## $ SalinityMin     <dbl> 29.93
## $ SalinityPrefMin <dbl> 33.82
## $ SalinityPrefMax <dbl> 35.1
## $ SalinityMax     <dbl> 35.61
## $ PrimProdYN      <int> 1
## $ PrimProdMin     <dbl> 1.48
## $ PrimProdPrefMin <dbl> 3.67
## $ PrimProdPrefMax <dbl> 12.88
## $ PrimProdMax     <dbl> 21.71
## $ IceConYN        <int> 1
## $ IceConMin       <dbl> -1
## $ IceConPrefMin   <dbl> 0
## $ IceConPrefMax   <dbl> 0
## $ IceConMax       <dbl> 0
## $ OxyYN           <int> 0
## $ OxyMin          <dbl> 100.38
## $ OxyPrefMin      <dbl> 196.51
## $ OxyPrefMax      <dbl> 209.1
## $ OxyMax          <dbl> 213.7
## $ LandDistYN      <int> 0
## $ LandDistMin     <dbl> 3
## $ LandDistPrefMin <dbl> 16
## $ LandDistPrefMax <dbl> 210
## $ LandDistMax     <dbl> 421
## $ Remark          <chr> NA
## $ DateCreated     <chr> "2019-06-24 00:00:00"
## $ DateModified    <chr> NA
## $ expert_id       <int> NA
## $ DateExpert      <chr> NA
## $ Layer           <chr> "s"
## $ Rank            <int> 1
## $ MapOpt          <int> 1
## $ ExtnRuleYN      <int> 1
## $ Reviewed        <int> NA
# for higher taxa - find the keys associated with higher taxa 
# am_search_exact(Family = am_search_exact(SpeciesID = key)$Family)

Species and locations

In the beginning of the vignette, the taxonomy name search functions were illustrated with some examples. These allow the user to get a list of available mapped species based on a certain criteria (e.g. taxonomic group), so that biodiversity (or species richness) for that group can be mapped.

We can also use a function to list species occurring in an area; a location identified by a single or multiple CsquareCode identifiers. Another function allows querying for species diversity or richness across a set of cells. Both functions allows for specifying a probability threshold as deemed fit.

A location can be determined based on a user defined bounding box or extent (given by four coordinates). Using am_hcaf() a set of cells can be determined based on other criteria, allowing retrieval of CsquareCode cell identifiers that belong to a specific LME, for example.

# get cell identifiers for a bounding box or extent
csc <- am_csc_from_extent(100, 120, -22, -7)$CsquareCode

# within in this area, the following species are listed appear, each in n cells
am_species_in_csc(csc)
## # A tibble: 1 x 2
##   SpeciesID     n
##   <chr>     <int>
## 1 Fis-29757    86
# in each cell location, the following number of distinct species are likely
# a measure of species diversity or "richness"
am_species_per_csc(csc, min_prob = 0.8)
## # A tibble: 53 x 2
##    CsquareCode n_species
##    <chr>           <int>
##  1 3010:475:2          1
##  2 3011:486:4          1
##  3 3011:487:3          1
##  4 3011:487:4          1
##  5 3011:488:3          1
##  6 3011:488:4          1
##  7 3011:489:3          1
##  8 3011:489:4          1
##  9 3011:496:2          1
## 10 3011:497:1          1
## # … with 43 more rows

Data scope and content

Scope

Please note that the database file differs from the complete version available online at https://aquamaps.org in the following respects:

  1. The database is incomplete in terms of species mapped (26,399 /33,518). It is based on AquaMaps’ conservative rule of generating envelopes and predictions for species with >=10 ‘good cells’ and excludes records of data-poor species (i.e. endemic and/or rare species). Please contact the AquaMaps team directly if you want access to the complete dataset.

  2. The map data provided (hcaf_species_native table) give computer-generated predictions. Please contact the AquaMaps team directly if you need to access the latest reviewed/improved species maps.

  3. Map data showing future species distributions for 2050 and 2100 (under different RCP scenarios) are excluded. Please contact the AquaMaps team directly if you are interested in these datasets.

We strongly encourage partnering with the AquaMaps team for larger research projects or publications that would make intensive use of AquaMaps to ensure that you have access to the latest version and/or reviewed maps, that the limitations of the data set are clearly understood and addressed, and that critical maps and/or unlikely results are recognized as such and double-checked for correctness prior to drawing conclusions and/or subsequent publication.

The AquaMaps team can be contacted through Rainer Froese () or Kristin Kaschner ().

Content

The dplyr package can be used to query the various tables available in the database.

Here is a description of tables and fields which are included.

knitr::kable(am_meta)
table field description type_mysql type
hcaf_r ID Unique HCAF ID, for internal use only. int int
hcaf_r CsquareCode A unique identifier for every half-degree cell in the global map based on the c-square method - a hierarchical cell labelling system developed at CSIRO Oceans and Atmosphere (then CSIRO Marine Research). Example: 3414:227:3 varchar chr
hcaf_r LOICZID LOICZ ID numbers are long integers from 1 to 259200. They begin with the cell centered at 89.75 degrees N latitude and 179.75 degrees W latitude and proceed from West to East. When a full circle of 720 cells is completed, the numbering steps one cell south along the -180 meridian and continues sequentially west to east. int int
hcaf_r NLimit Northern boundary of cell in decimal degrees latitude (positive in N hemisphere, negative in S hemisphere). Points falling on this line are considered inside the cell in the S hemisphere (exception: cells adjoining the equator, i.e., where N_limit = 0). Also (polar case), points on this line are “inside” in the N hemisphere when N_limit = 90. double dbl
hcaf_r Slimit Southern boundary of cell in decimal degrees latitude (positive in N hemisphere, negative in S hemisphere). Points falling on this line are considered inside the cell in the N hemisphere. Also (polar case), points on this line are “inside” in the S hemisphere when S_limit = -90. double dbl
hcaf_r WLimit Western boundary of cell in decimal degrees latitude (positive in E hemisphere, negative in W hemisphere). Points falling on this line are considered inside the cell in the E hemisphere. Also (boundary case), points on this line are “inside” in the W hemisphere when W_limit = -180. double dbl
hcaf_r ELimit Eastern boundary of cell in decimal degrees latitude (positive in E hemisphere, negative in W hemisphere). Points falling on this line are considered inside the cell in the W hemisphere (exception: cells adjoining the Greenwich Meridian, i.e., where E_limit = 0). Also (boundary case), points on this line are “inside” in the E hemisphere when E_limit = 180. double dbl
hcaf_r CenterLat The center point of the cell in decimal degrees latitude. double dbl
hcaf_r CenterLong The center point of the cell in decimal degrees longitude. double dbl
hcaf_r CellArea The total area inside the cell in square kilometers, using WGS84 and Miller cylindrical projection (KGS description). double dbl
hcaf_r OceanArea The area in the cell that is normally covered by sea water or permanent ice, in square kilometers (KGS description). double dbl
hcaf_r PWater Proportion of water in each cell. double dbl
hcaf_r ClimZoneCode Climate zone to which the cell belongs based on climate zone shape file in SAU database. varchar chr
hcaf_r FAOAreaM Code number of FAO statistical area to which the cell belongs, for all oceanic and coastal cells. int int
hcaf_r FAOAreaIn Code number of FAO statistical area to which the cell belongs, for all inland and coastal cells. int int
hcaf_r CountryMain UN code number of country, island or area to which the largest land area of the cell belongs, for all inland and coastal cells. varchar chr
hcaf_r CountrySecond UN code number of country, island or area to which the second largest land area of the cell belongs, for all inland and coastal cells. varchar chr
hcaf_r CountryThird UN code number of country, island or area to which the third largest land area of the cell belongs, for all inland and coastal cells. varchar chr
hcaf_r CountrySubMain ISO code number of state, province, region to which the largest land area of the cell belongs. varchar chr
hcaf_r CountrySubSecond ISO code number of state, province, region to which the second largest land area of the cell belongs. varchar chr
hcaf_r CountrySubThird ISO code number of state, province, region to which the third largest land area of the cell belongs. varchar chr
hcaf_r EEZ Code number of country, island or area to which the EEZ area in the cell belongs, for all coastal and oceanic cells. int int
hcaf_r LME Code number of the large marine ecosystem to which the cell belongs, as given by NOAA (http://www.lme.noaa.gov), for all coastal and oceanic cells. int int
hcaf_r LMEBorder Tags whether or not cell lies along the border of an LME. 0=No, 1=Yes tinyint int
hcaf_r MEOW 5-digit code (ECO_Code) refering to the marine ecoregion the cell belongs to, as assigned by MEOW, a biogeographic classification of the world’s coasts and shelves. int int
hcaf_r OceanBasin Major ocean basins of world with north and south sub-basins separated by latitudinal data from literature. int int
hcaf_r IslandsNo Number of coastal or oceanic islands contained in cell, as provided by the World Vector Shoreline database. int int
hcaf_r Area0_20 Area in cell from 0-20 m depth, in square kilometers, as provided by Smith and Sandwell: Bathymetry and Elevation (currently only 70 N to 70 S). double dbl
hcaf_r Area20_40 Area in cell from 20-40 m depth, in square kilometers, as provided by Smith and Sandwell: Bathymetry and Elevation. double dbl
hcaf_r Area40_60 Area in cell from 40-60 m depth, in square kilometers, as provided by Smith and Sandwell: Bathymetry and Elevation. double dbl
hcaf_r Area60_80 Area in cell 60-80 m depth, in square kilometers, as provided by Smith and Sandwell: Bathymetry and Elevation. double dbl
hcaf_r Area80_100 Area in cell from 80-100 m depth, in square kilometers, as provided by Smith and Sandwell: Bathymetry and Elevation. double dbl
hcaf_r AreaBelow100 Area in cell below 100 m depth, in square kilometers, as provided by Smith and Sandwell: Bathymetry and Elevation. double dbl
hcaf_r ElevationMin Minimum elevation above sea level in meters, as provided by ETOPO2. double dbl
hcaf_r ElevationMax Maximum elevation above sea level in cell in meters, as provided by ETOPO2. double dbl
hcaf_r ElevationMean Mean elevation above sea level in meters, as provided by ETOPO2. double dbl
hcaf_r ElevationSD Standard deviation of mean elevation above sea level in meters, as provided by ETOPO2. double dbl
hcaf_r DepthMin Minimum ETOPO 2min bathymetry (negative) elevation in 30min cell. double dbl
hcaf_r DepthMax Maximum ETOPO 2min bathymetry (negative) elevation in 30min cell. double dbl
hcaf_r DepthMean Mean ETOPO 2min bathymetry (negative) elevation in 30min cell. double dbl
hcaf_r DepthSD Standard deviation of mean bottom depth below sea level in meters, as provided by ETOPO2. double dbl
hcaf_r SSTAnMean Mean annual sea surface temperature in degree Celsius (2000-2014), as derived from Bio-ORACLE, for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r SBTAnMean Mean annual sea bottom temperature in degree Celsius, as derived from Bio-ORACLE (2000-2014), for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r SalinityMean Mean annual surface salinity in practical salinity scale (PSS), as derived from Bio-ORACLE (2000-2014), for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r SalinityBMean Mean annual bottom salinity in practical salinity scale (PSS), as derived from Bio-ORACLE (2000-2014), for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r PrimProdMean Proportion of annual surface primary production in a cell in mgC·m-3·day -1, for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r IceConAnn Mean annual sea ice concentration in percent (or fraction from 0-1), as derived from Bio-ORACLE (2000-2014), for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r OxyMean Mean annual dissolved molecular oxygen at the surface, in millimole per cubic meter, as derived from Bio-ORACLE (2000-2014), for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r OxyBMean Mean annual dissolved molecular oxygen at the surface, in millimole per cubic meter, as derived from Bio-ORACLE (2000-2014), for all coastal and oceanic cells, from 90 N to 78.5 S. double dbl
hcaf_r LandDist Distance (km) to the nearest coastal cell (water cells only). int int
hcaf_r Shelf The water area of the cell that lies within the shelf zone (0 - 200m depth); based on min/max elevation and proportion in depth zone. double dbl
hcaf_r Slope The water area of the cell that lies within the slope zone (>200 - 4000m depth); based on min/max elevation and proportion in depth zone. double dbl
hcaf_r Abyssal The water area of the cell that lies within the abyssal zone (> 4000m depth); based on min/max elevation and proportion in depth zone. double dbl
hcaf_r TidalRange Extent of tides in scaled discrete classes as provided by the original LOICZ Database, for all coastal and oceanic cells. int int
hcaf_r Coral Proportion of whole (even non-water) cell covered by coral WCMC pixelclassify - NOT corrected to 284,300 sq km globally World Atlas of Coral Reefs UNEP WCMC 2001. double dbl
hcaf_r Estuary Area covered by estuaries in the cell. double dbl
hcaf_r Seamount Number of known seamounts attributed to the cell. int int
hcaf_r MPA Proportion of cell covered by a Marine Protected Area. double dbl
speciesoccursum_r SpeciesID AquaMaps’ unique identifier for a valid species used by the Catalogue of Life Annual Checklist (www.catalogueoflife.org). Example for the whale shark: Fis-30583 varchar chr
speciesoccursum_r SpecCode Species identifier used in FishBase or SeaLifeBase int int
speciesoccursum_r Genus Genus name of the species varchar chr
speciesoccursum_r Species Specific epithet of the species varchar chr
speciesoccursum_r FBname Common name suggested by FishBase or SeaLifeBase varchar chr
speciesoccursum_r OccurRecs Number of point records used to generate good cells int int
speciesoccursum_r OccurCells Number of good cells used to generate species envelope int int
speciesoccursum_r StockDefs Distribution of the species as recorded in FishBase or SeaLifeBase longtext chr
speciesoccursum_r Kingdom Kingdom to which the species belongs varchar chr
speciesoccursum_r Phylum Phylum to which the species belongs varchar chr
speciesoccursum_r Class Class to which the species belongs varchar chr
speciesoccursum_r Order Order to which the species belongs varchar chr
speciesoccursum_r Family Family to which the species belongs varchar chr
speciesoccursum_r deepwater Does the species occur in the deep-sea (i.e. tagged bathypelagic or bathydemersal in FishBase or SeaLifeBase)? 0=No, 1=Yes tinyint int
speciesoccursum_r angling Is the species a sport fish (i.e. tagged as a GameFish in FishBase)? 0=No, 1=Yes tinyint int
speciesoccursum_r diving Is the species found on a dive (i.e. where DepthPrefMin in HSPEN < 20 meters)? 0=No, 1=Yes tinyint int
speciesoccursum_r dangerous Is the species dangerous (i.e. tagged as ‘traumatogenic or venonous’ in FishBase or SeaLifeBase)? 0=No, 1=Yes tinyint int
speciesoccursum_r m_invertebrates Is the species a marine invertebrate? 0=No, 1=Yes tinyint int
speciesoccursum_r highseas Is the species an open ocean fish species (i.e. tagged as pelagic-oceanic in FishBase)? 0=No, 1=Yes tinyint int
speciesoccursum_r invasive Is the species recorded to be invasive (i.e. in FishBase or SeaLifeBase)? 0=No, 1=Yes tinyint int
speciesoccursum_r resilience Resilience of the species (i.e. as recorded in FishBase/SeaLifeBase) varchar chr
speciesoccursum_r iucn_id IUCN species identifier int int
speciesoccursum_r iucn_code IUCN Red list classification assigned to the species varchar chr
speciesoccursum_r iucn_version IUCN version varchar chr
speciesoccursum_r provider FishBase (FB) or SeaLifeBase (SLB)? varchar chr
occurrencecells_r RecordID Unique occurrencecells ID, for internal use only. int int
occurrencecells_r CsquareCode A unique identifier for every half-degree cell in the global map based on the c-square method - a hierarchical cell labelling system developed at CSIRO Oceans and Atmosphere (then CSIRO Marine Research). Example: 3414:227:3 varchar chr
occurrencecells_r SpeciesID AquaMaps’ unique identifier for a valid species used by the Catalogue of Life Annual Checklist (www.catalogueoflife.org). Example for the whale shark: Fis-30583 varchar chr
occurrencecells_r SpecCode Species identifier used in FishBase/SeaLifeBase. int int
occurrencecells_r GoodCell Is the cell a good cell (following the AquaMaps’ definition of a good cell i.e. cCell falls inside the known bounding box and/or FAO areas where the species is reported to occur)? 0=No, 1=Yes tinyint int
occurrencecells_r InFAOArea Does the cell occur within the FAO areas where the species is reported to occur? 0=No, 1=Yes tinyint int
occurrencecells_r InBoundBox Does the cell occur within the bounding box where the species is reported to occur? 0=No, 1=Yes tinyint int
occurrencecells_r GBIF_YN Is the cell partially/completely based on GBIF point data? null=No, 1=Yes tinyint int
occurrencecells_r OBIS_YN Is the cell partially/completely based on OBIS point data? null=No, 1=Yes tinyint int
occurrencecells_r FBSLB_YN Is the cell partially/completely based on FishBase/SeaLifeBase occurrence records? null=No, 1=Yes tinyint int
occurrencecells_r CountryPoint_YN Is the cell partially/completely based on FishBase/SeaLifeBase country records? null=No, 1=Yes tinyint int
occurrencecells_r AWI_YN Is the cell partially/completely based on AWI point data? null=No, 1=Yes tinyint int
occurrencecells_r IATTC_YN Is the cell partially/completely based on IATTC point data? null=No, 1=Yes tinyint int
occurrencecells_r UWA_YN Is the cell partially/completely based on UWA point data? null=No, 1=Yes tinyint int
occurrencecells_r CenterLat The center point of the cell in decimal degrees latitude. Example: 89.75 double dbl
occurrencecells_r CenterLong NA NA dbl
occurrencecells_r FAOAreaM FAO area to which the cell belongs. tinyint int
hspen_r SpeciesID AquaMaps’ unique identifier for a valid species used by the Catalogue of Life Annual Checklist (www.catalogueoflife.org). Example for the whale shark: Fis-30583 varchar chr
hspen_r Speccode Species identifier used in FishBase or SeaLifeBase. int int
hspen_r LifeStage Life stage of the species. Currently all envelopes refer to adult environmental preferences. varchar chr
hspen_r FAOAreas Comma-delimited string containing the FAO area codes where native occurrence of the species has been reported in the literature. Example: 5, 7, 18, 27, 37 varchar chr
hspen_r FAOComplete Are the FAO areas listed in FAOAreas complete for this species? 0=No, 1=Yes tinyint int
hspen_r NMostLat Northern-most latitude of distributional range of this species, in decimal degrees. Example: 55.5 double dbl
hspen_r SMostLat Southern-most latitude of distributional range of this species, in decimal degrees. Example: -15 double dbl
hspen_r WMostLong Western-most longitude of distributional range of this species in decimal degrees. Example: -130 double dbl
hspen_r EMostLong Eastern-most longitude of distributional range of this species in decimal degrees. Example: -80 double dbl
hspen_r DepthYN Is the depth parameter used in computing map data? 0=No, 1=Yes tinyint int
hspen_r DepthMin Minimum depth where the species has been found (in meters). Example: 20 int int
hspen_r DepthPrefMin Minimum depth PREFERRED by the species (in meters). Example: 30 int int
hspen_r DepthPrefMax Maximum depth PREFERRED by the species (in meters). Example: 60 int int
hspen_r DepthMax Maximum depth range where this species has been found (in meters). Example: 120 int int
hspen_r MeanDepth Is mean depth used to fit the depth envelope? By default, marine mammals use mean depth. 0=No, 1=Yes tinyint int
hspen_r Pelagic Does the species occurs in the water column well above and largely independent of the bottom? 0=No, 1=Yes tinyint int
hspen_r TempYN Is the temperature parameter used in computing map data? 0=No, 1=Yes tinyint int
hspen_r TempMin Minimum temperature tolerated by the species (in deg C). Example: 16 double dbl
hspen_r TempPrefMin Minimum temperature PREFERRED by the species (in deg C). Example: 20.0 double dbl
hspen_r TempPrefMax Maximum temperature PREFERRED by the species (in deg C). Example: 27.0 double dbl
hspen_r TempMax Maximum temperature tolerated by the species (in deg C). Example: 31 double dbl
hspen_r SalinityYN Is the salinity parameter used in generating map data? 0=No, 1=Yes tinyint int
hspen_r SalinityMin Minimum salinity tolerated by the species (in psu). Example: 20 double dbl
hspen_r SalinityPrefMin Minimum salinity PREFERRED by the species (in psu). Example: 33.4 double dbl
hspen_r SalinityPrefMax Maximum salinity PREFERRED by the species (in psu). Example: 35.7 double dbl
hspen_r SalinityMax Maximum salinity tolerated by the species (in psu). Example: 38 double dbl
hspen_r PrimProdYN Is the primary production parameter used in computing map data? 0=No, 1=Yes tinyint int
hspen_r PrimProdMin Minimum amount of primary production tolerated by the species (in mgC·m-3·day-1). Example: 0 double dbl
hspen_r PrimProdPrefMin Minimum amount of primary production PREFERRED by the species (in mgC·m-3·day-1). Example: 579 double dbl
hspen_r PrimProdPrefMax Maximum amount of primary production PREFERRED by the species (in mgC·m-3·day-1). Example: 1754 double dbl
hspen_r PrimProdMax Maximum amount of primary production tolerated by the species (in mgC·m-3·day-1). Example: 2935 double dbl
hspen_r IceConYN Is the ice concentration parameter used in computing map data? 0=No, 1=Yes tinyint int
hspen_r IceConMin Minimum sea ice concentration tolerated by the species (0-1 fraction). double dbl
hspen_r IceConPrefMin Minimum sea ice concentration PREFERRED by the species (0-1 fraction). double dbl
hspen_r IceConPrefMax Maximum sea ice concentration PREFERRED by the species (0-1 fraction). double dbl
hspen_r IceConMax Maximum sea ice concentration tolerated by the species (0-1 fraction). double dbl
hspen_r OxyYN Is the dissolved bottom oxygen parameter used in computing map data? 0=No, 1=Yes tinyint int
hspen_r OxyMin Minimum dissolved bottom oxygen tolerated by the species (in mmol·m-3). Example 1.33 double dbl
hspen_r OxyPrefMin Minimum dissolved bottom oxygen PREFERRED by the species (in mmol·m-3). Example 231.42 double dbl
hspen_r OxyPrefMax Maximum dissolved bottom oxygen PREFERRED by the species (in mmol·m-3). Example 327.77 double dbl
hspen_r OxyMax Maximum dissolved bottom oxygen tolerated by the species (in mmol·m-3). Example 408.99 double dbl
hspen_r LandDistYN Is the distance to land parameter used in computing map data? 0=No, 1=Yes tinyint int
hspen_r LandDistMin Minimum distance to land tolerated by the species (in km). Example: 20 double dbl
hspen_r LandDistPrefMin Minimum distance to land PREFERRED by the species in (km). Example: 33 double dbl
hspen_r LandDistPrefMax Maximum distance to land PREFERRED by the species (in km). Example: 35 double dbl
hspen_r LandDistMax Maximum distance to land tolerated by the species (in km). Example: 38 double dbl
hspen_r Remark Text field to accommodate any remarks relevant to this record. longtext chr
hspen_r DateCreated Date and time when this record was first created. Example: 2019-06-24 00:00:00 datetime chr
hspen_r DateModified Date and time when this record was last modified. If the record has not been modified, field is empty. Example: 2019-08-19 00:00:00 datetime chr
hspen_r expert_id ID of the expert who last reviewed the envelope. int int
hspen_r DateExpert Date and time when this record was last edited by an expert. Example: 2019-08-29 00:00:00 datetime chr
hspen_r Layer Indicates whether the temperature and salinity parameters are based on bottom (=b) or surface (=s) values of half-degree cells used to compute the envelope. char chr
hspen_r Rank Internal code for basis of computation for environmental envelope (1 = with >10 good cells; 2 = with 3-9 good cells only; 3 = restricted range, one known point, new species). tinyint int
hspen_r MapOpt Indicates how native map (predicted probabilities) is plotted: 1 = area covered by both species’ bounding box and FAO areas, 2 = area covered by species’ FAO areas only, 3 = area covered by species’ bounding box only. tinyint int
hspen_r ExtnRuleYN Was the FAO extension rule applied in the generation of the species envelope? 0=No, 1=Yes, null tinyint int
hspen_r Reviewed Is this a reviewed envelope? 0=No, 1=Yes, null tinyint int
hcaf_species_native SpeciesID AquaMaps’ unique identifier for a valid species used by the Catalogue of Life Annual Checklist (www.catalogueoflife.org). Example for the whale shark: Fis-30583 varchar chr
hcaf_species_native CsquareCode A unique identifier for every half-degree cell in the global map based on the c-square method - a hierarchical cell labelling system developed at CSIRO Oceans and Atmosphere (then CSIRO Marine Research). Example: 3414:227:3 varchar chr
hcaf_species_native CenterLat The center point of the cell in decimal degrees latitude. Example: 89.75 double dbl
hcaf_species_native CenterLong The center point of the cell in decimal degrees longitude. Example: -179.75 double dbl
hcaf_species_native Probability Overall probability of occurrence of the species in the cell (ranging from 0.01 to 1). Example: 0.71 float dbl
hcaf_species_native FAOAreaYN Does this cell fall within an FAO area where the species is known to occur (endemic/native)? 0=No, 1=Yes tinyint int
hcaf_species_native BoundBoxYN Does this cell fall within the geographical bounding box known for the species? 0=No, 1=Yes tinyint int

Data management

This section describes how the data was prepared for usage in this R package. It may be of interest maybe not primarily for package users, but for those interested in understanding the data preparation steps involved in preparing the dataset for use in this package.

For data management and preparation, several steps are involved in preparing the dataset used in this package. These steps involve moving the relevant parts of the source data from its primary source into a local SQLite3 database that the package uses.

Local replication of source database

The source data lives in a MySQL/MariaDB database. If this data is made available in the form of a backup from a raw datadir or, preferably, in the form of a data dump, this can be loaded into a local MariaDB database engine.

With docker-compose this can be done in one step, using the command docker-compose up -d and this docker-compose.yml file:

volumes:
  db:

services:

  db:
    image: mariadb:latest
    ports:
      - "3306:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=your_root_db_password
      - MYSQL_DATABASE=aquamapsdb
      - MYSQL_USER=your_db_user
      - MYSQL_PASSWORD=your_db_password
    volumes:
      - db:/var/lib/mysql
      - ./aquamaps.sql:/docker-entrypoint-initdb.d/aquamaps.sql:ro

After this step, the data is available to access locally through aquamapsdata::am_con().

The “metadata” for table and field names and their descriptions is provided through aquamapsdata::am_meta which is prepared by means of (data-raw/am-meta.R). This metadata is used in package documentation and the am_search_exact() function allows for taxonomic searches using some of those fields.

Syncing into SQLite3

A set of functions then allows for syncing the data into an SQLite3 database with full text search support, which gets indexed.

Relevant steps are:

  • Make a chunkwise sync using db_sync() from source connection to target db.
  • Add full text search functionality using am_create_fts()
  • Add indexes through am_create_indexes()

Exposing the data

The function am_search_exact takes a lot of parameters, which can be combined, to query the taxonomy in a single call.

The am_search_fuzzy is quick and allows FTS5 search syntax (search terms which can be quoted and also combined with AND, OR, NOT).

These functions returns search results containing keys or identifiers that can be used to retrieve map data in raster format through am_raster(). With such a raster a leaflet map can be created with am_map_leaflet().