Data Sources & Methodology

Deep Dive 15 min read Updated March 2026

1. Introduction

Ironbrand's AI engine processes over 1,100 distinct metrics collected from dozens of independent data sources. Every trading signal, risk assessment, and market insight produced by the platform begins with raw data — and the quality of that data is the single most important determinant of AI accuracy.

This page provides a complete, transparent accounting of every data source the engine relies on: what it collects, why it matters, how it is validated, and how it flows through the pipeline from raw feed to actionable intelligence. No black boxes, no hand-waving — just verifiable methodology.

Design Principle: Every data source used by Ironbrand is either publicly accessible or fully documented. We do not rely on proprietary, paywalled, or opaque feeds. This guarantees reproducibility and allows independent verification of our methodology.

The data landscape is organized into five major categories: Market Microstructure, On-Chain Analytics, Sentiment & Social, Geopolitical Intelligence, and Locally Computed Technical Indicators. Each category feeds into the signal engine through a well-defined pipeline with caching, validation, and fallback layers.

2. Market Microstructure Data

Market microstructure data forms the backbone of every trading decision. It captures what is happening right now across exchanges: prices, volumes, order flow, funding dynamics, and positioning.

2.1 Price Feeds — CoinGecko API

Ironbrand aggregates real-time price data from 40+ cryptocurrency exchanges through the CoinGecko API. This covers over 18,000 listed assets, providing comprehensive market coverage far beyond the top-cap assets alone.

Endpoint: /api/v3/simple/price and /api/v3/coins/markets
Rate Limit: Free tier — 30 calls per minute
Data Points: Current price, 24h volume, market cap, 24h change, ATH, ATL
Authentication: None required (public API)

Why CoinGecko? Unlike single-exchange feeds, CoinGecko aggregates prices across multiple venues, reducing the impact of exchange-specific anomalies (wicks, flash crashes, wash trading). The free tier is sufficient for our polling cadence and imposes no cost barrier to operation.

2.2 OHLCV Candlestick Data — Kraken Futures Charts API

The engine's core price structure analysis depends on high-quality candlestick (OHLCV) data sourced from the Kraken Futures Charts API, a public endpoint that requires no API key.

Available Resolutions: 1m, 5m, 15m, 30m, 1h, 4h, 12h, 1d, 1w
Depth: Up to 5,000 candles per request
Data Points per Candle: Open, High, Low, Close, Volume, Timestamp
Authentication: None (public endpoint)

Candle data is stored in an in-memory cache with per-resolution TTL values calibrated to match expected data freshness:

5-minute candles: 30-second TTL
1-hour candles: 300-second TTL (5 minutes)
Daily / weekly candles: 3,600-second TTL (1 hour)

This tiered caching ensures that fast-moving timeframes are always near-real-time while reducing unnecessary API calls for slower-moving data.

2.3 Funding Rates — Kraken Futures

Perpetual swap funding rates are a critical indicator of market positioning. When funding is deeply positive, longs are paying shorts — signaling crowded bullish positioning and potential mean reversion. The reverse applies for deeply negative funding.

Symbols Tracked: PF_XBTUSD, PI_XBTUSD
Rate Conversion: Annualized rate ÷ (365 × 3) = per-8-hour rate
Cache TTL: 5 minutes (300 seconds)
Authentication: Public endpoint, no key required

Signal Value: Extreme funding rates (>0.05% or <-0.03% per 8h) have historically preceded significant mean-reversion moves. The engine uses funding as a contrarian filter — very high funding dampens long conviction, very negative funding dampens short conviction.

2.4 Open Interest — CoinGlass

Open Interest (OI) measures the total value of outstanding derivative contracts. Rising OI with rising price confirms trend strength; rising OI with falling price suggests aggressive short building.

Source: CoinGlass aggregated OI (free, no API key)
Data Points: Total OI, OI change (1h, 4h, 24h), exchange breakdown
Cache TTL: 10 minutes (600 seconds)

2.5 Liquidation Data — CoinGlass

Liquidation cascades are among the most powerful short-term market forces. A burst of long liquidations accelerates downside moves; short liquidations accelerate upside squeezes.

Source: CoinGlass liquidation feed (free, no API key)
Data Points: Long liquidations (USD), Short liquidations (USD), Long/Short ratio
Analysis: Ratio analysis to detect imbalanced liquidation pressure
Cache TTL: 5 minutes (300 seconds)

3. On-Chain Data

On-chain data provides a view into the fundamental health of the Bitcoin network — independent of exchange activity. Mempool congestion, fee pressure, and hash rate trends all carry signal that traditional market data cannot capture.

3.1 Bitcoin Mempool — mempool.space API

The Bitcoin mempool is the staging area for unconfirmed transactions. Its size and composition reveal network demand in real time.

Endpoint: mempool.space/api/mempool
Data Points: Unconfirmed transaction count, total mempool size (vBytes), fee distribution
Signal Value: A rapidly filling mempool with rising fee pressure often correlates with periods of high on-chain activity — large transfers, exchange withdrawals, or panic-driven movement
Authentication: None required

3.2 Fee Estimates — mempool.space

Fee estimates across multiple confirmation targets provide a granular view of network congestion and urgency.

Endpoint: mempool.space/api/v1/fees/recommended
Tiers Tracked:
- Fastest: Next-block confirmation
- 30 minutes: ~3 blocks
- 1 hour: ~6 blocks
- Economy: Low-priority, cheapest tier
Signal Value: Fee spikes (fastest > 100 sat/vB) suggest urgency, often driven by large players moving assets off exchanges

3.3 Network Hash Rate — blockchain.info

Hash rate is the computational power securing the Bitcoin network. A sustained decline can signal miner capitulation; sustained growth signals long-term confidence in the network's economic viability.

Source: blockchain.info hash rate API
Timeframe: 30-day rolling trend
Signal Value: Hash rate divergence from price (price falling, hash rate rising) has historically preceded recoveries, as miners — who have deep operational insight — continue to invest despite bearish price action

3.4 BTC Dominance — CoinGecko Global Endpoint

Bitcoin dominance (BTC market cap as a percentage of total crypto market cap) is a key regime indicator.

Endpoint: /api/v3/global
Data Points: BTC dominance percentage, total crypto market cap (USD)
Cache TTL: 1 hour (3,600 seconds)
Signal Value: Rising BTC dominance in a falling market = risk-off rotation. Falling BTC dominance in a rising market = altcoin season / risk-on rotation.

4. Sentiment & Social Data

Sentiment data captures the emotional temperature of the market. While price tells you what happened, sentiment tells you how participants feel about what happened — and extreme sentiment is one of the most reliable contrarian indicators available.

4.1 Fear & Greed Index — Alternative.me

The Crypto Fear & Greed Index is a composite indicator that distills multiple market signals into a single 0–100 score.

Source: Alternative.me API (completely free, no key required)
Scale: 0 = Extreme Fear, 100 = Extreme Greed
Cache TTL: 30 minutes (1,800 seconds)
Component Breakdown:
- Volatility (25%): Current volatility vs 30-day / 90-day averages
- Market Momentum (25%): Volume and momentum vs recent averages
- Social Media (15%): Sentiment analysis from Twitter/X and Reddit
- Surveys (15%): Community polling data
- Dominance (10%): Bitcoin dominance trend
- Trends (10%): Google Trends search volume for crypto terms

Contrarian Signal: Extreme Fear readings (<20) have historically marked accumulation zones. Extreme Greed readings (>80) have preceded corrections. The engine uses Fear & Greed as a position-sizing modifier — reducing exposure during greed and increasing opportunity scanning during fear.

4.2 Social Sentiment — CoinGecko Community Data

CoinGecko aggregates community sentiment through bullish/bearish voting mechanisms on individual assets.

Data Points: Bullish percentage, Bearish percentage, total votes
Cache TTL: 30 minutes (1,800 seconds)
Usage: Cross-referenced with Fear & Greed to validate broad sentiment readings. Disagreement between the two signals raises a caution flag in the pipeline.

4.3 News Aggregation — CryptoPanic

CryptoPanic aggregates cryptocurrency news from hundreds of sources and provides community-driven sentiment labels on each article.

Data Points: Article titles, source, timestamp, community sentiment vote
Sentiment Scoring: Normalized to a −1 to +1 scale (negative = bearish, positive = bullish)
Cache TTL: 15 minutes (900 seconds)
Fallback: RSS feed parsing when the primary API is unavailable, ensuring continuous news flow even during API outages
Volume Analysis: Sudden spikes in article volume on a specific topic are weighted as potential catalysts

5. Geopolitical & News Intelligence

Cryptocurrency markets do not exist in a vacuum. Macro events — rate decisions, geopolitical conflicts, regulatory actions, tariff announcements — can override technical and on-chain signals entirely. Ironbrand's engine explicitly models this external risk layer.

5.1 GDELT Project

The Global Database of Events, Language, and Tone (GDELT) monitors broadcast, print, and web news from nearly every country on Earth, in over 100 languages, and identifies events, themes, and emotional tone in real time.

Access: Free, rate-limited (1 request per 5 seconds respected)
Tone Scale: −15 (extremely negative) to +15 (extremely positive)
Analysis Window: 72-hour rolling window
Keywords Tracked: crypto, tariff, fed, geopolitical, bitcoin, regulation, sanctions, war
Cache TTL: 30 minutes (1,800 seconds)

How GDELT is Used: The engine tracks the rate of change in global news tone, not just the absolute level. A rapid deterioration in tone (e.g., from +3 to −5 within 24 hours) triggers an elevated risk flag regardless of what technical indicators show. This has proven effective during sudden sanctions announcements, military escalations, and surprise regulatory actions.

5.2 Polymarket Prediction Markets

Prediction markets aggregate the collective intelligence (and financial conviction) of participants betting real money on future outcomes. They provide probability estimates that are often more accurate than expert forecasts.

Source: Polymarket API (free, no key required)
Keywords Monitored: bitcoin, crypto, recession, tariff, fed, war, sanctions
Risk Scoring: 0–10 scale based on probability × volume weighting
Cache TTL: 10 minutes (600 seconds)

The engine uses Polymarket data as a forward-looking risk indicator. For example, if a "US Recession in 2026" contract rises from 20% to 55% probability with high volume, this feeds into the macro risk score and adjusts position sizing and signal confidence accordingly.

6. Technical Indicators (Computed Locally)

Unlike the data sources above, technical indicators are computed locally by the engine from raw OHLCV data. No third-party indicator service is used — every calculation is deterministic and auditable.

6.1 Relative Strength Index (RSI)

Period: 14
Method: Wilder's smoothing (exponential moving average of gains/losses)
Overbought Threshold: >70
Oversold Threshold: <30
Usage: Momentum filter; signals in overbought/oversold territory receive adjusted confidence scores

6.2 Exponential Moving Averages (EMA)

EMA Fast: 20-period on 5-minute candles
EMA Slow: 50-period on 5-minute candles
EMA Trend: 20-period on 1-hour candles for macro trend confirmation
Crossover Detection: Golden cross (fast > slow) = bullish; death cross (fast < slow) = bearish
Multi-timeframe: Alignment between 5m and 1h EMAs strengthens signal conviction

6.3 Average True Range (ATR)

Period: 14
Usage: Volatility measurement for dynamic stop-loss calculation
Stop-Loss Formula: Entry price ± (ATR × multiplier), ensuring stops adapt to current volatility rather than using fixed percentages

6.4 Pivot Points & Break of Structure (BoS)

Timeframe: 15-minute candles
Method: Swing high / swing low detection with BoS confirmation
Usage: Structural trend change detection — a confirmed BoS on the 15m timeframe is one of the strongest entry/exit triggers in the engine

6.5 Volume Analysis

Volume SMA: 20-period
Minimum Filter: 0.5x average volume required for signal validation
Usage: Signals generated on below-average volume receive a confidence penalty; high-volume signals receive a boost. This filters out low-conviction noise.

7. Data Pipeline Architecture

Data flows through the Ironbrand engine in a four-level pipeline, where each level adds structure, context, and intelligence to raw inputs.

Level 1: Data Collection → Validation → Caching

At the base layer, dedicated data collectors poll each source according to its optimal cadence. Incoming data passes through validation checks:

Schema validation — expected fields present and correctly typed
Range validation — values fall within plausible bounds
Timestamp validation — data is not stale beyond its expected freshness
Deduplication — identical data points are not double-counted

Valid data is written to the in-memory cache with source-specific TTL values. Invalid data is logged and discarded, with an alert if rejection rates exceed thresholds.

Level 2: Signal Engine

The signal engine consumes cached data and computes technical indicators (RSI, EMA, ATR, Pivots) from raw OHLCV candles. It combines these with microstructure data (funding, OI, liquidations) to generate raw signals — directional hypotheses with initial confidence scores.

Each raw signal includes: direction (long/short), entry price, stop-loss, take-profit, timeframe, confidence score, supporting indicators
Volume filter applied: signals with volume < 0.5x SMA(20) are penalized
Multi-timeframe alignment check: 5m signal + 1h trend agreement = confidence boost

Level 3: Context Enrichment

Raw signals are enriched with two context layers:

Market Context: Fear & Greed, BTC dominance, social sentiment, funding extremes, liquidation imbalances. These modify the signal's confidence score and can flip a marginal signal to "skip."
Political Context: GDELT tone, Polymarket risk scores, CryptoPanic news sentiment. A deteriorating geopolitical backdrop suppresses long signals and amplifies short signals (and vice versa).

Level 4: LLM Analyst (Multi-Provider AI)

The final filter is a large language model (LLM) analyst that receives the enriched signal package and produces a structured assessment:

Natural-language reasoning about why the signal makes sense (or doesn't)
Final confidence score (0–100)
Risk assessment and position-sizing recommendation
Specific conditions that would invalidate the thesis

Why an LLM? Traditional rule-based engines struggle with ambiguity. When funding is high but momentum is strong, when news is bearish but price refuses to drop, when multiple indicators conflict — the LLM analyst can weigh competing narratives and produce a nuanced judgment that rigid rules cannot. It acts as the final quality gate, not the primary signal source.

8. Cache Strategy & Data Freshness

Every data source operates on a calibrated cache schedule. The goal is to maintain the freshest possible view while respecting rate limits and minimizing unnecessary API calls.

Data Source	TTL	Seconds	Fallback Behavior
5M Candles (OHLCV)	30s	30	Serve stale + retry on next cycle
1H Candles (OHLCV)	5 min	300	Serve stale + retry on next cycle
Funding Rate	5 min	300	Use last known value with decay flag
Open Interest	10 min	600	Use last known value with staleness warning
Liquidations	5 min	300	Zero-fill (assume no liquidations)
Fear & Greed Index	30 min	1,800	Use last known value (updates daily)
Social Sentiment	30 min	1,800	Use last known value
CryptoPanic News	15 min	900	Switch to RSS fallback feed
GDELT Tone	30 min	1,800	Use last known value with decay weighting
Polymarket	10 min	600	Use last known value
BTC Dominance	1 hour	3,600	Use last known value (slow-moving metric)

Stale Data Policy: When a cache entry expires and the refresh fails, the engine does not discard the data. Instead, it serves the stale value with a staleness_flag attached. Downstream consumers (signal engine, LLM analyst) can see that the data is older than expected and adjust their confidence accordingly. This ensures graceful degradation rather than total blindness.

9. Data Quality Controls

Raw data from external sources is inherently noisy, occasionally incorrect, and sometimes missing entirely. The engine implements multiple layers of quality control to ensure that only clean, reliable data reaches the signal generation stage.

9.1 Retry Logic

All API calls are wrapped in a retry mechanism with exponential backoff:

Attempt 1: Immediate
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4 (final): Wait 8 seconds

After all retries are exhausted, the engine falls back to cached data (if available) or the source-specific fallback behavior documented in the cache strategy table above.

9.2 Gap Detection

For time-series data (especially OHLCV candles), the engine checks for missing candles by verifying timestamp continuity. If a gap is detected:

Gaps of 1–2 candles: forward-fill from last known value with a quality flag
Gaps of 3+ candles: mark the period as unreliable, suppress signals that depend on continuous data in that window
All gaps are logged for post-session analysis

9.3 Rate Limit Respect

Each data source has a configured rate limit that the engine strictly respects:

CoinGecko: 30 requests/minute (free tier)
GDELT: 1 request per 5 seconds
mempool.space: Best-effort, no hard limit but polite polling
CoinGlass: Reasonable usage, no published limit for free tier

Rate limit counters are maintained per-source. If a source approaches its limit, requests are queued and delayed rather than dropped, ensuring no data is missed.

9.4 Multi-Source Cross-Validation

Where the same metric is available from multiple sources, the engine cross-validates:

Price: CoinGecko aggregated price vs. Kraken direct feed. Divergence >0.5% triggers a warning and uses the aggregated (multi-exchange) value.
Sentiment: Fear & Greed vs. CoinGecko social sentiment vs. CryptoPanic news sentiment. Agreement strengthens confidence; disagreement triggers a "mixed sentiment" flag.
Volume: CoinGecko reported volume vs. OHLCV candle volume. Significant discrepancies may indicate wash trading on specific exchanges.

9.5 Anomaly Detection

The engine applies statistical anomaly detection to incoming data points:

Z-score filter: Data points more than 4 standard deviations from the rolling mean are flagged for review
Sudden-spike detection: If a metric changes by more than 3x its typical rate of change within a single polling interval, it is held for one additional polling cycle before being accepted
Timestamp anomalies: Data with future timestamps or timestamps more than 24 hours old is rejected

10. API Reliability & Fallbacks

Ironbrand's engine is designed with no single point of failure. Every critical data path has at least one fallback, and the system degrades gracefully when sources become unavailable.

10.1 All Sources Are Free / Public

A deliberate architectural choice: all primary data sources use free, publicly accessible APIs. This eliminates vendor lock-in, subscription dependencies, and the risk of sudden API key revocation disrupting operations.

Source	Cost	Auth Required	Fallback
CoinGecko	Free	No	Cached data + stale flag
Kraken Futures	Free	No	Cached data + stale flag
CoinGlass	Free	No	Cached data + stale flag
mempool.space	Free	No	blockchain.info alternative
blockchain.info	Free	No	Cached data + stale flag
Alternative.me	Free	No	Last known value (daily update)
CryptoPanic	Free	API key (free)	RSS feed fallback
GDELT	Free	No	Cached data + decay weighting
Polymarket	Free	No	Cached data + stale flag

10.2 Fallback Chains

For mission-critical data paths, the engine implements ordered fallback chains:

News Sentiment: CryptoPanic API → CryptoPanic RSS → GDELT tone as proxy → neutral assumption
Price Data: CoinGecko aggregated → Kraken direct → last cached value with staleness warning
On-Chain Data: mempool.space → blockchain.info → cached value with quality flag

10.3 Monitoring & Alerting

The engine continuously tracks the health of every data source:

Success rate: Percentage of successful API calls per source over the last 1 hour
Latency tracking: P50, P95, and P99 response times per source
Staleness monitoring: How many cached entries are currently being served past their TTL
Alert thresholds: Source success rate <80% triggers a warning; <50% triggers an escalation; 0% triggers automatic fallback activation

Resilience by Design: The engine has been tested under conditions where up to 40% of data sources are simultaneously unavailable. Even in this degraded state, the signal pipeline continues to operate — with reduced confidence scores that accurately reflect the diminished data quality. No source outage causes a complete system halt.