Specleaner R Package
[!NOTE] 📹 Video Reference 6:20 Review the Specleaner R Package
The technical foundation of the Pan-European workflow is the Specleaner R package. It provides a homogeneous and robust approach for identifying outliers in species occurrence records.
Automated Flagging & Ensemble Methods
Rather than relying on a single method, Specleaner combines 20 different outlier detection methods into one approach. This “ensemble” logic ensures higher reliability for data used in species distribution modelling.
1. Outlier Identification Methods
Univariate Methods
These require a single environmental predictor (e.g., Temperature).
| Method | Description |
|---|---|
| Z-score | Flags records that deviate significantly from the mean. |
| Interquartile Range (IQR) | Flags records outside the statistical “whiskers.” |
| Ecological Ranges | Checks if records exceed known suitable ecological ranges for the species. |
Multivariate Methods
These check for outliers in multi-dimensional space, considering multiple predictors simultaneously.
| Method | Description |
|---|---|
| Isolation Forest | Isolates anomalies by randomly partitioning data. |
| One-Class Support Vector Machines (SVM) | Learns the boundary of “normal” points and flags those outside. |
2. Weighting & Voting System
Specleaner doesn’t just say “Outlier” or “Not Outlier.” It uses the function m_detect to compile results and weight each record:
- Poor/Fair Outlier: Flagged by only 1 out of 10 selected methods.
- Perfect Outlier: Flagged by ALL 10 methods.
- Not an Outlier: Flagged by none.
Researchers can then use thresholds (e.g., Poor, Fair, Moderate, Strong, Perfect) to filter data or apply expert knowledge to decide which records have an “ecological consequence” and should be removed.