The Species Sensitivity Distribution (SSD) approach
is a central tool in environmental risk assessment to define safe levels
for contaminants within a set of multiple species. It is based on the
assumption that species sensitivity to a given contaminant can be
described by a probability distribution estimated from a set of several
toxicity values, previously obtained from MOSAICsurv,
MOSAICrepro
or MOSAICgrowth.
The MOSAICSSD
module enables any user to perform a simple yet statistically sound SSD
analysis including censored (unbounded or part of an interval) toxicity
values, without worrying about the conceptually difficult underlying
statistical questions [1]. MOSAICSSD
provides the so-called hazardous concentration for p% of the species
(HCp). All calculations are based on the companion R
package fitdistrplus
[2].
Within MOSAICSSD, the user can choose among the log-normal and log-logistic distribution laws to be fitted to their data set. The value of the likelihood function for each distribution is provided on the result page and can be used as a further decision criterion (the highest the likelihood value, the most appropriate the distribution). The log-logistic distribution has heavier tails than the log-normal and is therefore generally more conservative in the determination of the 5% hazardous concentration (HC5).
After clicking Run
, the 95% and 90% bootstrap confidence
intervals are automatically computed. They yield confidence intervals on
the parameters of the chosen distribution and on several computed
HCp. Calculating the confidence intervals using a
bootstrap method has the advantage of using a unified framework for
every distribution. As the bootstrap procedure does not necessarily
converge depending on the size of the data set, an automatic check of
bootstrap convergence is implemented [1].
When using your own data, you can either upload it from a file (click
on Load from file
) or directly type/paste the values in the
displayed empty table. For a file upload, the supported format is the
following.
The expected data for an SSD analysis is a set of toxicity values of a given contaminant estimated for several species. You must upload your data as a tabular-separated text file (.txt), one line corresponding to one species. The exact syntax of the lines differs according to the type of data: point wise (toxicity value known without uncertainty) or censored (left and/or right bounded toxicity values). Only positive values are accepted.
To each line (species), you can associate a name label (for example species name) as well as a grouping label that will affect the color of the data points in the results.
Pointwise data: the file must contain one positive
value per line and optionally one or two text labels separated by a
TAB
character:
2.3 Bidyanus bidyanus Australian Fish
7.3 Cyprinus carpio Non-Australian Fish
2.7 Gambusia holbrooki Australian Fish
Censored data: each line must contain two numerical
values (a lower bound and an upper bound), separated by a
TAB
character. Missing bounds must be denoted with
NA
. Both lower and upper bound cannot be NA
on
the same line. If one toxicity value is known as a point wise value,
enter it twice, with the same value as the lower and the upper bounds.
One or two text labels can be added after the numerical values:
1.45 1.85 Oligochaeta
2.31 NA Tricladida
NA 0.99 Tricladida
1.11 1.11 Mollusca
Manually typing in the data: when typing in the displayed table, the first two columns can only contain positive numerical values. If your data is pointwise, as in not censored, you can leave one of the two columns empty. The labels for names and/or groups should go in the last two columns. If you do not have labels, you can leave the last two columns empty as is. The values can be manually typed in, but a copy paste of multiple cells from your spreadsheet software of choice is also possible.
In order to try MOSAICSSD with an example data
set, click on Try with an example
. A few data sets are
available, with censored toxicity data or not. When you select an
example, the concentration unit as well as the toggle switches that
indicate whether the data is censored or not and the labels to use are
automatically set to the appropriate value.
There are other inputs that need to be provided before running the analysis. The three toggle switches and associated drop down selectors next to the data table can be used to specify whether the data is censored, as well as how to use the labels provided, if any (even if labels are in the data, it is not compulsory to display them on the graph). Note that you cannot use the two label columns for the same purpose (groups or names), and the drop down selectors will automatically update to reflect that.
The toggle switch concerning censored data is dynamic and will automatically update its value depending on the number of concentration columns filled in, two implying censored data. You can still manually change the value of this toggle switch if you so desire. In that case, only the first column of numerical values is used as pointwise data.
A concentration unit for the data can be selected from the drop down menu. It will be displayed on the resulting plot but won’t have any effect on the computation. Finally, the desired distribution(s) to fit should be selected using the check boxes.
The Reset
button can be used to clear all inputs and
outputs of the module.
After choosing one (or two) probability distribution(s) and clicking
on Run
, you immediately get the estimated distribution(s):
solid orange and dotted green curves correspond to the log-normal and
log-logistic fitted distributions, respectively. The black points and
lines correspond to the empirical cumulative distribution of the data
provided as input. In the case of censored toxicity data, the Wang
estimate (see Note section below for more information) of this
distribution function is by default displayed. At this stage, only point
wise estimates of the distribution parameters and
HCp are displayed.
Note: In case of censored data, when adding name labels or group coloring, the raw data without the Wang estimate is represented in order to differentiate each observation.
After a while, the bootstrap process finishes and the confidence intervals around the distribution(s) curve(s) are displayed. The confidence intervals around the distribution(s) parameters and the HCp confidence intervals are also provided. By default, those intervals are 95% intervals, but they can all be replaced by 90% intervals using the radio buttons in the left panel.
The x-axis can be switched between a linear scale and a log scale for a better visualization.
Using the corresponding toggle switch, the raw data can be colored by group if group labels were associated to the data. If names were associated, they can be displayed alongside each observation.
As with the other modules, MOSAICSSD provides the possibility to download a report summarizing the analysis as well as the R script allowing to perform said analysis directly within the R software [3].
# This script was generated by MOSAIC, a web application dedicated to
# ecotoxicology. It is available at http://pbil.univ-lyon1.fr/software/mosaic/
# To use this script, it is recommended to consult the reference manual of the
# fitdistrplus package http://cran.r-project.org/web/packages/fitdistrplus/fitdistrplus.pdf
# For any further question, please contact us at mosaic@univ-lyon1.fr
library(ssd4mosaic)
data <- structure(list(left = c(3.8, 33.6, 87, 1700, 640, 1155, 113,
129, 586, 1856, 1.6, 4.8, 82, 155), right = c(3.8, 33.6, 87,
NA, 640, NA, 113, 129, 586, NA, 1.6, 4.8, 82, 155), name = c("Tubifex sp.",
"Dugesia sp.", "Polycelis nigra/tenuis", "Sphaerium sp.", "Physa fontinalis",
"Erpobdella sp. (juv.)", "Asellus aquaticus", "Gammarus pulex",
"Proasellus coxalis", "Cloeon dipterum", "Brachionus calyciflorus",
"Acanthocyclops venustus", "Daphnia group galeata", "Daphnia magna"
), group = c("Oligochaeta", "Tricladida", "Tricladida", "Mollusca",
"Mollusca", "Hirudinea", "Macrocrustacea", "Macrocrustacea",
"Macrocrustacea", "Insecta", "Rotifera", "Copepoda", "Cladocera",
"Cladocera")), row.names = c(NA, 14L), class = "data.frame")
distributions <- list("lnorm")
logscale <- TRUE
unit <- 'μg/L'
CI.level <- 0.95
## model fitting
fits <- get_fits(data, distributions, TRUE)
lapply(fits, summary)
## bootstrapping
bts <- get_bootstrap(fits)[[1]]
## HCx values
lapply(bts, quantile, probs = c(0.05, 0.1, 0.2, 0.5), CI.level = CI.level)
## CDF plot with confidence intervals
p <- base_cdf(fits, unit = unit, logscale = logscale)
add_CI_plot(p, bts, logscale, CI.level = CI.level)
## CDF plot with species names
options_plot(fits, unit, logscale, data, use_names = TRUE)
## CDF plot colored by group
options_plot(fits, unit, logscale, data, use_groups = TRUE)
In MOSAICSSD, we use the Wang algorithm to
represent the empirical cumulative distribution function of censored
data. All the details on this algorithm and its alternatives can be
found in the FAQ
page of the companion paper fitdistrplus
.
In short, this method can be summarized using this graph, also found in the FAQ:
[1] Kon Kam King G, Veber P, Charles S, Delignette-Muller ML. 2014. MOSAIC_SSD: a new web tool for species sensitivity distribution to include censored data by maximum likelihood. Environ. Toxicol. Chem. 33:2133–9.
[2] Delignette-Muller ML, Dutang C. 2015. fitdistrplus : An R Package for Fitting Distributions. J. Stat. Softw. 64:1–34.
[3] R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.