← Back to Intelligence

Data Sources & Methodology

1. Introduction

Ironbrand's AI engine processes over 1,100 distinct metrics collected from dozens of independent data sources. Every trading signal, risk assessment, and market insight produced by the platform begins with raw data — and the quality of that data is the single most important determinant of AI accuracy.

This page provides a complete, transparent accounting of every data source the engine relies on: what it collects, why it matters, how it is validated, and how it flows through the pipeline from raw feed to actionable intelligence. No black boxes, no hand-waving — just verifiable methodology.

Design Principle: Every data source used by Ironbrand is either publicly accessible or fully documented. We do not rely on proprietary, paywalled, or opaque feeds. This guarantees reproducibility and allows independent verification of our methodology.

The data landscape is organized into five major categories: Market Microstructure, On-Chain Analytics, Sentiment & Social, Geopolitical Intelligence, and Locally Computed Technical Indicators. Each category feeds into the signal engine through a well-defined pipeline with caching, validation, and fallback layers.

2. Market Microstructure Data

Market microstructure data forms the backbone of every trading decision. It captures what is happening right now across exchanges: prices, volumes, order flow, funding dynamics, and positioning.

2.1 Price Feeds — CoinGecko API

Ironbrand aggregates real-time price data from 40+ cryptocurrency exchanges through the CoinGecko API. This covers over 18,000 listed assets, providing comprehensive market coverage far beyond the top-cap assets alone.

Why CoinGecko? Unlike single-exchange feeds, CoinGecko aggregates prices across multiple venues, reducing the impact of exchange-specific anomalies (wicks, flash crashes, wash trading). The free tier is sufficient for our polling cadence and imposes no cost barrier to operation.

2.2 OHLCV Candlestick Data — Kraken Futures Charts API

The engine's core price structure analysis depends on high-quality candlestick (OHLCV) data sourced from the Kraken Futures Charts API, a public endpoint that requires no API key.

Candle data is stored in an in-memory cache with per-resolution TTL values calibrated to match expected data freshness:

This tiered caching ensures that fast-moving timeframes are always near-real-time while reducing unnecessary API calls for slower-moving data.

2.3 Funding Rates — Kraken Futures

Perpetual swap funding rates are a critical indicator of market positioning. When funding is deeply positive, longs are paying shorts — signaling crowded bullish positioning and potential mean reversion. The reverse applies for deeply negative funding.

Signal Value: Extreme funding rates (>0.05% or <-0.03% per 8h) have historically preceded significant mean-reversion moves. The engine uses funding as a contrarian filter — very high funding dampens long conviction, very negative funding dampens short conviction.

2.4 Open Interest — CoinGlass

Open Interest (OI) measures the total value of outstanding derivative contracts. Rising OI with rising price confirms trend strength; rising OI with falling price suggests aggressive short building.

2.5 Liquidation Data — CoinGlass

Liquidation cascades are among the most powerful short-term market forces. A burst of long liquidations accelerates downside moves; short liquidations accelerate upside squeezes.

3. On-Chain Data

On-chain data provides a view into the fundamental health of the Bitcoin network — independent of exchange activity. Mempool congestion, fee pressure, and hash rate trends all carry signal that traditional market data cannot capture.

3.1 Bitcoin Mempool — mempool.space API

The Bitcoin mempool is the staging area for unconfirmed transactions. Its size and composition reveal network demand in real time.

3.2 Fee Estimates — mempool.space

Fee estimates across multiple confirmation targets provide a granular view of network congestion and urgency.

3.3 Network Hash Rate — blockchain.info

Hash rate is the computational power securing the Bitcoin network. A sustained decline can signal miner capitulation; sustained growth signals long-term confidence in the network's economic viability.

3.4 BTC Dominance — CoinGecko Global Endpoint

Bitcoin dominance (BTC market cap as a percentage of total crypto market cap) is a key regime indicator.

4. Sentiment & Social Data

Sentiment data captures the emotional temperature of the market. While price tells you what happened, sentiment tells you how participants feel about what happened — and extreme sentiment is one of the most reliable contrarian indicators available.

4.1 Fear & Greed Index — Alternative.me

The Crypto Fear & Greed Index is a composite indicator that distills multiple market signals into a single 0–100 score.

Contrarian Signal: Extreme Fear readings (<20) have historically marked accumulation zones. Extreme Greed readings (>80) have preceded corrections. The engine uses Fear & Greed as a position-sizing modifier — reducing exposure during greed and increasing opportunity scanning during fear.

4.2 Social Sentiment — CoinGecko Community Data

CoinGecko aggregates community sentiment through bullish/bearish voting mechanisms on individual assets.

4.3 News Aggregation — CryptoPanic

CryptoPanic aggregates cryptocurrency news from hundreds of sources and provides community-driven sentiment labels on each article.

5. Geopolitical & News Intelligence

Cryptocurrency markets do not exist in a vacuum. Macro events — rate decisions, geopolitical conflicts, regulatory actions, tariff announcements — can override technical and on-chain signals entirely. Ironbrand's engine explicitly models this external risk layer.

5.1 GDELT Project

The Global Database of Events, Language, and Tone (GDELT) monitors broadcast, print, and web news from nearly every country on Earth, in over 100 languages, and identifies events, themes, and emotional tone in real time.

How GDELT is Used: The engine tracks the rate of change in global news tone, not just the absolute level. A rapid deterioration in tone (e.g., from +3 to −5 within 24 hours) triggers an elevated risk flag regardless of what technical indicators show. This has proven effective during sudden sanctions announcements, military escalations, and surprise regulatory actions.

5.2 Polymarket Prediction Markets

Prediction markets aggregate the collective intelligence (and financial conviction) of participants betting real money on future outcomes. They provide probability estimates that are often more accurate than expert forecasts.

The engine uses Polymarket data as a forward-looking risk indicator. For example, if a "US Recession in 2026" contract rises from 20% to 55% probability with high volume, this feeds into the macro risk score and adjusts position sizing and signal confidence accordingly.

6. Technical Indicators (Computed Locally)

Unlike the data sources above, technical indicators are computed locally by the engine from raw OHLCV data. No third-party indicator service is used — every calculation is deterministic and auditable.

6.1 Relative Strength Index (RSI)

6.2 Exponential Moving Averages (EMA)

6.3 Average True Range (ATR)

6.4 Pivot Points & Break of Structure (BoS)

6.5 Volume Analysis

7. Data Pipeline Architecture

Data flows through the Ironbrand engine in a four-level pipeline, where each level adds structure, context, and intelligence to raw inputs.

Level 1: Data Collection → Validation → Caching

At the base layer, dedicated data collectors poll each source according to its optimal cadence. Incoming data passes through validation checks:

Valid data is written to the in-memory cache with source-specific TTL values. Invalid data is logged and discarded, with an alert if rejection rates exceed thresholds.

Level 2: Signal Engine

The signal engine consumes cached data and computes technical indicators (RSI, EMA, ATR, Pivots) from raw OHLCV candles. It combines these with microstructure data (funding, OI, liquidations) to generate raw signals — directional hypotheses with initial confidence scores.

Level 3: Context Enrichment

Raw signals are enriched with two context layers:

Level 4: LLM Analyst (Multi-Provider AI)

The final filter is a large language model (LLM) analyst that receives the enriched signal package and produces a structured assessment:

Why an LLM? Traditional rule-based engines struggle with ambiguity. When funding is high but momentum is strong, when news is bearish but price refuses to drop, when multiple indicators conflict — the LLM analyst can weigh competing narratives and produce a nuanced judgment that rigid rules cannot. It acts as the final quality gate, not the primary signal source.

8. Cache Strategy & Data Freshness

Every data source operates on a calibrated cache schedule. The goal is to maintain the freshest possible view while respecting rate limits and minimizing unnecessary API calls.

Data Source TTL Seconds Fallback Behavior
5M Candles (OHLCV) 30s 30 Serve stale + retry on next cycle
1H Candles (OHLCV) 5 min 300 Serve stale + retry on next cycle
Funding Rate 5 min 300 Use last known value with decay flag
Open Interest 10 min 600 Use last known value with staleness warning
Liquidations 5 min 300 Zero-fill (assume no liquidations)
Fear & Greed Index 30 min 1,800 Use last known value (updates daily)
Social Sentiment 30 min 1,800 Use last known value
CryptoPanic News 15 min 900 Switch to RSS fallback feed
GDELT Tone 30 min 1,800 Use last known value with decay weighting
Polymarket 10 min 600 Use last known value
BTC Dominance 1 hour 3,600 Use last known value (slow-moving metric)
Stale Data Policy: When a cache entry expires and the refresh fails, the engine does not discard the data. Instead, it serves the stale value with a staleness_flag attached. Downstream consumers (signal engine, LLM analyst) can see that the data is older than expected and adjust their confidence accordingly. This ensures graceful degradation rather than total blindness.

9. Data Quality Controls

Raw data from external sources is inherently noisy, occasionally incorrect, and sometimes missing entirely. The engine implements multiple layers of quality control to ensure that only clean, reliable data reaches the signal generation stage.

9.1 Retry Logic

All API calls are wrapped in a retry mechanism with exponential backoff:

After all retries are exhausted, the engine falls back to cached data (if available) or the source-specific fallback behavior documented in the cache strategy table above.

9.2 Gap Detection

For time-series data (especially OHLCV candles), the engine checks for missing candles by verifying timestamp continuity. If a gap is detected:

9.3 Rate Limit Respect

Each data source has a configured rate limit that the engine strictly respects:

Rate limit counters are maintained per-source. If a source approaches its limit, requests are queued and delayed rather than dropped, ensuring no data is missed.

9.4 Multi-Source Cross-Validation

Where the same metric is available from multiple sources, the engine cross-validates:

9.5 Anomaly Detection

The engine applies statistical anomaly detection to incoming data points:

10. API Reliability & Fallbacks

Ironbrand's engine is designed with no single point of failure. Every critical data path has at least one fallback, and the system degrades gracefully when sources become unavailable.

10.1 All Sources Are Free / Public

A deliberate architectural choice: all primary data sources use free, publicly accessible APIs. This eliminates vendor lock-in, subscription dependencies, and the risk of sudden API key revocation disrupting operations.

Source Cost Auth Required Fallback
CoinGecko Free No Cached data + stale flag
Kraken Futures Free No Cached data + stale flag
CoinGlass Free No Cached data + stale flag
mempool.space Free No blockchain.info alternative
blockchain.info Free No Cached data + stale flag
Alternative.me Free No Last known value (daily update)
CryptoPanic Free API key (free) RSS feed fallback
GDELT Free No Cached data + decay weighting
Polymarket Free No Cached data + stale flag

10.2 Fallback Chains

For mission-critical data paths, the engine implements ordered fallback chains:

10.3 Monitoring & Alerting

The engine continuously tracks the health of every data source:

Resilience by Design: The engine has been tested under conditions where up to 40% of data sources are simultaneously unavailable. Even in this degraded state, the signal pipeline continues to operate — with reduced confidence scores that accurately reflect the diminished data quality. No source outage causes a complete system halt.

Further Reading