The Species Sensitivity Distribution (SSD) approach is a central tool in environmental risk assessment to define safe levels for contaminants within a set of multiple species. It is based on the assumption that species sensitivity to a given contaminant can be described by a probability distribution estimated from a set of several toxicity values, previously obtained from MOSAICsurv, MOSAICrepro or MOSAICgrowth. The MOSAICSSD module enables any user to perform a simple yet statistically sound SSD analysis including censored (unbounded or part of an interval) toxicity values, without worrying about the conceptually difficult underlying statistical questions [1]. MOSAICSSD provides the so-called hazardous concentration for p% of the species (HCp). All calculations are based on the companion R package fitdistrplus [2].

Basics

Within MOSAICSSD, the user can choose among the log-normal and log-logistic distribution laws to be fitted to their data set. The value of the likelihood function for each distribution is provided on the result page and can be used as a further decision criterion (the highest the likelihood value, the most appropriate the distribution). The log-logistic distribution has heavier tails than the log-normal and is therefore generally more conservative in the determination of the 5% hazardous concentration (HC5).

After clicking Run, the 95% and 90% bootstrap confidence intervals are automatically computed. They yield confidence intervals on the parameters of the chosen distribution and on several computed HCp. Calculating the confidence intervals using a bootstrap method has the advantage of using a unified framework for every distribution. As the bootstrap procedure does not necessarily converge depending on the size of the data set, an automatic check of bootstrap convergence is implemented [1].

Step 1: Data uploading

When using MOSAICSSD, the first step is to upload input data.

Input tab panel

User data

When using your own data, you can either upload it from a file (click on Load from file) or directly type/paste the values in the displayed empty table. For a file upload, the supported format is the following.

The expected data for an SSD analysis is a set of toxicity values of a given contaminant estimated for several species. You must upload your data as a tabular-separated text file (.txt), one line corresponding to one species. The exact syntax of the lines differs according to the type of data: point wise (toxicity value known without uncertainty) or censored (left and/or right bounded toxicity values). Only positive values are accepted.

To each line (species), you can associate a name label (for example species name) as well as a grouping label that will affect the color of the data points in the results.

Pointwise data: the file must contain one positive value per line and optionally one or two text labels separated by a TAB character:

                                           
 2.3 Bidyanus bidyanus  Australian Fish    
 7.3 Cyprinus carpio    Non-Australian Fish
 2.7 Gambusia holbrooki Australian Fish    

Censored data: each line must contain two numerical values (a lower bound and an upper bound), separated by a TAB character. Missing bounds must be denoted with NA. Both lower and upper bound cannot be NA on the same line. If one toxicity value is known as a point wise value, enter it twice, with the same value as the lower and the upper bounds. One or two text labels can be added after the numerical values:

                      
 1.45 1.85 Oligochaeta
 2.31   NA Tricladida 
   NA 0.99 Tricladida 
 1.11 1.11 Mollusca   

Manually typing in the data: when typing in the displayed table, the first two columns can only contain positive numerical values. If your data is pointwise, as in not censored, you can leave one of the two columns empty. The labels for names and/or groups should go in the last two columns. If you do not have labels, you can leave the last two columns empty as is. The values can be manually typed in, but a copy paste of multiple cells from your spreadsheet software of choice is also possible.

Example data

Example modal

In order to try MOSAICSSD with an example data set, click on Try with an example. A few data sets are available, with censored toxicity data or not. When you select an example, the concentration unit as well as the toggle switches that indicate whether the data is censored or not and the labels to use are automatically set to the appropriate value.

Other inputs

There are other inputs that need to be provided before running the analysis. The three toggle switches and associated drop down selectors next to the data table can be used to specify whether the data is censored, as well as how to use the labels provided, if any (even if labels are in the data, it is not compulsory to display them on the graph). Note that you cannot use the two label columns for the same purpose (groups or names), and the drop down selectors will automatically update to reflect that.

The toggle switch concerning censored data is dynamic and will automatically update its value depending on the number of concentration columns filled in, two implying censored data. You can still manually change the value of this toggle switch if you so desire. In that case, only the first column of numerical values is used as pointwise data.

A concentration unit for the data can be selected from the drop down menu. It will be displayed on the resulting plot but won’t have any effect on the computation. Finally, the desired distribution(s) to fit should be selected using the check boxes.

The Reset button can be used to clear all inputs and outputs of the module.

Step 2: Results and interpretation

After choosing one (or two) probability distribution(s) and clicking on Run, you immediately get the estimated distribution(s): solid orange and dotted green curves correspond to the log-normal and log-logistic fitted distributions, respectively. The black points and lines correspond to the empirical cumulative distribution of the data provided as input. In the case of censored toxicity data, the Wang estimate (see Note section below for more information) of this distribution function is by default displayed. At this stage, only point wise estimates of the distribution parameters and HCp are displayed.

Note: In case of censored data, when adding name labels or group coloring, the raw data without the Wang estimate is represented in order to differentiate each observation.

Results without CI

After a while, the bootstrap process finishes and the confidence intervals around the distribution(s) curve(s) are displayed. The confidence intervals around the distribution(s) parameters and the HCp confidence intervals are also provided. By default, those intervals are 95% intervals, but they can all be replaced by 90% intervals using the radio buttons in the left panel.

Results with CI

The x-axis can be switched between a linear scale and a log scale for a better visualization.

Using the corresponding toggle switch, the raw data can be colored by group if group labels were associated to the data. If names were associated, they can be displayed alongside each observation.

Plot with labels

As with the other modules, MOSAICSSD provides the possibility to download a report summarizing the analysis as well as the R script allowing to perform said analysis directly within the R software [3].

# This script was generated by MOSAIC, a web application dedicated to
# ecotoxicology. It is available at http://pbil.univ-lyon1.fr/software/mosaic/

# To use this script, it is recommended to consult the reference manual of the
# fitdistrplus package http://cran.r-project.org/web/packages/fitdistrplus/fitdistrplus.pdf

# For any further question, please contact us at mosaic@univ-lyon1.fr
library(ssd4mosaic)

data <- structure(list(left = c(3.8, 33.6, 87, 1700, 640, 1155, 113, 
129, 586, 1856, 1.6, 4.8, 82, 155), right = c(3.8, 33.6, 87, 
NA, 640, NA, 113, 129, 586, NA, 1.6, 4.8, 82, 155), name = c("Tubifex sp.", 
"Dugesia sp.", "Polycelis nigra/tenuis", "Sphaerium sp.", "Physa fontinalis", 
"Erpobdella  sp. (juv.)", "Asellus aquaticus", "Gammarus pulex", 
"Proasellus coxalis", "Cloeon dipterum", "Brachionus calyciflorus", 
"Acanthocyclops venustus", "Daphnia group galeata", "Daphnia magna"
), group = c("Oligochaeta", "Tricladida", "Tricladida", "Mollusca", 
"Mollusca", "Hirudinea", "Macrocrustacea", "Macrocrustacea", 
"Macrocrustacea", "Insecta", "Rotifera", "Copepoda", "Cladocera", 
"Cladocera")), row.names = c(NA, 14L), class = "data.frame")

distributions <- list("lnorm")
logscale <- TRUE
unit <- 'μg/L'
CI.level <- 0.95

## model fitting
fits <- get_fits(data, distributions, TRUE)
lapply(fits, summary)

## bootstrapping
bts <- get_bootstrap(fits)[[1]]

## HCx values
lapply(bts, quantile, probs = c(0.05, 0.1, 0.2, 0.5), CI.level = CI.level)

## CDF plot with confidence intervals
p <- base_cdf(fits, unit = unit, logscale = logscale)
add_CI_plot(p, bts, logscale, CI.level = CI.level)

## CDF plot with species names
options_plot(fits, unit, logscale, data, use_names = TRUE)

## CDF plot colored by group
options_plot(fits, unit, logscale, data, use_groups = TRUE)

Note: censored data representation

In MOSAICSSD, we use the Wang algorithm to represent the empirical cumulative distribution function of censored data. All the details on this algorithm and its alternatives can be found in the FAQ page of the companion paper fitdistrplus.

In short, this method can be summarized using this graph, also found in the FAQ:

Wang estimmation method

References

[1] Kon Kam King G, Veber P, Charles S, Delignette-Muller ML. 2014. MOSAIC_SSD: a new web tool for species sensitivity distribution to include censored data by maximum likelihood. Environ. Toxicol. Chem. 33:2133–9.

[2] Delignette-Muller ML, Dutang C. 2015. fitdistrplus : An R Package for Fitting Distributions. J. Stat. Softw. 64:1–34.

[3] R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.