Earth Science

Machine Learning Prospectivity Mapping: How AI Is Reshaping Greenfields Discovery

March 20, 2026 · 9 min read

Prospectivity mapping — predicting where mineralization is most likely to occur across a study area using available geoscientific data — has been a research topic since the 1990s. For most of that history it lived in academic journals and government surveys, with the methods rotating through weights-of-evidence, fuzzy logic, neural networks, and eventually ensemble methods like random forests. The industry watched, occasionally adopted, and mostly kept doing target generation the way it had always done it: a geologist with a stack of maps and a synthesis of what worked at the camp next door.

Two things changed in the last several years. First, the methods converged. The exploration ML community settled on a small number of approaches that work well — primarily random forests, gradient-boosted trees, and a few specific neural architectures — and stopped chasing every novel algorithm published in computer science. Second, the work moved from research papers into operational practice, with firms like ALS GoldSpot (formerly Goldspot Discoveries) and KoBold Metals running ML prospectivity at industrial scale and the academic community continuing to publish methodological refinements. The output of a competent 2026 ML prospectivity workflow is now meaningfully better than the output of expert geologist-only target generation in the regions where the input data is rich enough.

What ML Prospectivity Actually Is

At its core, a prospectivity model is a supervised classification problem framed in geographic space. You define a study area, divide it into cells (typically 100m to 1km on a side depending on data resolution), and for each cell you compute features from the available data: geology, geophysics, geochemistry, structural lineaments, alteration, lithology, proximity to known mineral occurrences, and any other relevant predictor. You label some cells as positive — based on known mineralization — and some cells as negative — based on the absence of mineralization despite reasonable exploration coverage. You train a classifier on the labeled cells, then apply it to predict probability across the whole grid.

The output is a continuous probability surface across the study area, mapping a per-cell likelihood of mineralization given the input features. The surface can be thresholded to produce target areas, contoured for visualization, or used directly to rank holdings and prioritize ground acquisition. The work is mathematically well defined, and reproducible by any team with the same data, the same labels, and the same methodology.

The catch — and it's a meaningful one — is that the choice of features, the choice of training labels, and the choice of method all encode assumptions about what mineralization looks like. A model trained on Carlin-type sedimentary gold occurrences in Nevada will identify more Carlin-type ground; it will not identify an orogenic system, an IOCG deposit, or a porphyry. The model expresses a hypothesis encoded by the geologist who curated the training set. ML prospectivity is not a hypothesis-free target generation method; it is a hypothesis-amplification method.

The Methods That Survived

Three classes of method are doing useful work in published industry studies. Random forests remain the workhorse for good reasons: they handle mixed data types well, they tolerate missing features gracefully, they require minimal tuning, and they produce interpretable feature-importance measures that geologists can sanity-check. For most prospectivity studies on regional or district scales, a well-feature-engineered random forest is the right starting point and often the right ending point.

Gradient-boosted trees — XGBoost, LightGBM, CatBoost — typically outperform random forests by a modest margin when properly tuned, at the cost of being more sensitive to hyperparameters and harder to interpret. For competitive studies where every percent of accuracy matters, gradient boosting is now the standard. For day-to-day exploration work, the marginal accuracy improvement is usually not worth the additional model-management overhead.

Convolutional neural networks have a niche role when the input data is genuinely image-like — high-resolution airborne imagery, structural lineament maps, hyperspectral surfaces. CNNs can extract spatial textures and patterns that don't show up in cell-by-cell feature vectors. The cost is much more training data, much more compute, and much less interpretability. For most exploration workflows the tradeoff isn't worth it; for specific large-scale studies with rich imagery and substantial known mineralization, it is.

Where Real Value Shows Up

The clearest value of ML prospectivity is integration. A senior exploration geologist mentally synthesizes geology, geophysics, geochemistry, alteration, and structure when targeting — but the synthesis is intuitive, hard to communicate, and inconsistent across geologists. An ML model performs the same synthesis explicitly, reproducibly, and across thousands of cells at once. For a regional study where the question is "which of these 50 license blocks should we keep, drop, or expand," the model produces a defensible ranking that a single geologist's intuition can't easily match.

The second source of value is data integration at scale. A targeting model can ingest dozens of features per cell — derivatives of magnetic data, multiple gravity transforms, ratios of geochemistry, distances to structural elements, lithology proximity, alteration scores from hyperspectral processing — and find weighted combinations that humans don't naturally consider. Geophysical professionals often interpret a single dataset thoroughly; ML methods reliably surface anomaly patterns that show up only when multiple datasets are considered jointly.

The third value is in known-deposit replication. If a junior is acquiring ground in a district where a known deposit has been thoroughly characterized, an ML model trained on the known deposit's geophysical, geochemical, and geological signature will identify similar signatures elsewhere in the district with high reliability. This is the "more of what we know" use case, and it's where ML prospectivity has the strongest track record in industry studies.

Where ML Prospectivity Underperforms

The honest counterweight is that ML prospectivity struggles in three common situations. First, in data-poor regions. The accuracy of an ML model is bounded by the quality and density of input features, and in many frontier exploration jurisdictions the data simply isn't there at useful resolution. A study based on coarse regional magnetics and sparse geochemistry will produce a low-resolution probability surface that's not meaningfully better than expert judgment, and may be worse because it presents false precision.

Second, in novel-deposit-type discovery. Every supervised model is bounded by its training data. If the goal is to find a deposit type that hasn't been previously characterized in the region — a new style of mineralization, a deeper-than-typical expression of a known type, an unconventional setting — the model is structurally limited. This is where geologist intuition still dominates, because intuition can extrapolate from analogous settings worldwide while a regional model can only interpolate within its training distribution.

Third, in known-target-validation. Once a target has been generated and the question is "should we drill this one or the next one," ML methods add limited value. The decision at that point is dominated by drilling economics, access, environmental considerations, and per-target risk — domains where ML models contribute nothing useful and shouldn't be expected to.

What a Good Study Looks Like

A defensible ML prospectivity study has a few recognizable characteristics. It describes the feature engineering in enough detail that another team could reproduce it from the same input data. It uses cross-validation that respects spatial autocorrelation — spatial k-fold or leave-one-area-out, not random sample-level k-fold, which leaks information from training to validation through spatial proximity. It reports feature importance and provides geological interpretation of which features the model relied on, so a competent geologist can sanity-check whether the model is responding to meaningful signal or to artifacts. It quantifies uncertainty explicitly, either through probabilistic outputs or through ensemble disagreement.

A bad study, by contrast, often presents a single probability map without describing methodology, uses naive cross-validation that inflates apparent accuracy, doesn't disclose what features drove the predictions, and provides no uncertainty estimate. These studies are common in marketing material and exploration press releases. They are not analytical work product; they are sales documents. The difference is important to recognize.

The good news is that the methodology bar for credible prospectivity work is now well established and openly published. The exploration data science community has converged on best practices, and a competent contractor will follow them. The ratio of credible to non-credible ML prospectivity offerings is improving, but buyers still need to ask the right questions about methodology before signing a statement of work.

Cost and Engagement Realities

A regional or district-scale ML prospectivity study, contracted to a specialist firm with appropriate methodology, currently runs in the $25,000 to $80,000 range depending on area, data availability, and scope. That's the cost of a small follow-up program — well within the budget envelope of any moderately financed junior. The deliverable is typically a probability surface, ranked target list, methodology documentation, and feature-importance interpretation.

A junior with an internal data-literate geologist can replicate much of this in-house using open-source tools — scikit-learn, GeoPandas, SimPEG for geophysical processing, QGIS for visualization. The time investment is meaningful: roughly four to eight weeks to build a competent first model and another month to refine it with domain feedback. The cost is essentially the geologist's time. The catch is that doing it well requires both geological and data-science judgment in the same head; building it in-house is harder than commissioning it externally, and worth doing only if there's a long-term commitment to building internal capability.

The cheapest entry point — appropriate for a junior wanting to understand the methodology before commissioning anything — is to take publicly available geoscience data for your jurisdiction and run a basic prospectivity baseline using the open-source stack. The result is rarely production-quality, but the exercise teaches the team what the methodology requires, what data gaps exist on the project, and what to ask of any external provider.

What Comes Next

The methodology bar will keep rising. The current research frontier in academic prospectivity work is in better handling of uncertainty, better integration of disparate data types, better handling of the small-positive-set problem (you usually have very few known deposits relative to the size of the study area), and more transparent model explanations. Industry adoption typically trails the academic frontier by three to five years; the methods we'll see in commercial exploration work in 2030 are already in the published literature.

For now, the right strategic posture for a junior is to take ML prospectivity seriously as a tool — neither dismiss it as a fad nor build the whole exploration thesis around it. The juniors that win the next cycle are the ones using it routinely on regional studies, ground evaluation, and pre-drill prioritization, while continuing to rely on classical geology for hypothesis generation and target validation.

For an outside read on whether ML prospectivity would surface anything useful on your specific project, our free workflow audit includes exploration data workflows, or contact us to scope a pilot study on existing data.

See Our Industry Solutions

We build custom AI solutions for earth science industries — from mineral exploration to environmental compliance.