Specleaner R Package

[!NOTE] 📹 Video Reference 6:20 Review the Specleaner R Package

The technical foundation of the Pan-European workflow is the Specleaner R package. It provides a homogeneous and robust approach for identifying outliers in species occurrence records.

Automated Flagging & Ensemble Methods

Rather than relying on a single method, Specleaner combines 20 different outlier detection methods into one approach. This “ensemble” logic ensures higher reliability for data used in species distribution modelling.

1. Outlier Identification Methods

Univariate Methods

These require a single environmental predictor (e.g., Temperature).

Method Description
Z-score Flags records that deviate significantly from the mean.
Interquartile Range (IQR) Flags records outside the statistical “whiskers.”
Ecological Ranges Checks if records exceed known suitable ecological ranges for the species.

Multivariate Methods

These check for outliers in multi-dimensional space, considering multiple predictors simultaneously.

Method Description
Isolation Forest Isolates anomalies by randomly partitioning data.
One-Class Support Vector Machines (SVM) Learns the boundary of “normal” points and flags those outside.

2. Weighting & Voting System

Specleaner doesn’t just say “Outlier” or “Not Outlier.” It uses the function m_detect to compile results and weight each record:

  • Poor/Fair Outlier: Flagged by only 1 out of 10 selected methods.
  • Perfect Outlier: Flagged by ALL 10 methods.
  • Not an Outlier: Flagged by none.

Researchers can then use thresholds (e.g., Poor, Fair, Moderate, Strong, Perfect) to filter data or apply expert knowledge to decide which records have an “ecological consequence” and should be removed.