Data Sources & Methodology
1. Introduction
Ironbrand's AI engine processes over 1,100 distinct metrics collected from dozens of independent data sources. Every trading signal, risk assessment, and market insight produced by the platform begins with raw data — and the quality of that data is the single most important determinant of AI accuracy.
This page provides a complete, transparent accounting of every data source the engine relies on: what it collects, why it matters, how it is validated, and how it flows through the pipeline from raw feed to actionable intelligence. No black boxes, no hand-waving — just verifiable methodology.
The data landscape is organized into five major categories: Market Microstructure, On-Chain Analytics, Sentiment & Social, Geopolitical Intelligence, and Locally Computed Technical Indicators. Each category feeds into the signal engine through a well-defined pipeline with caching, validation, and fallback layers.
2. Market Microstructure Data
Market microstructure data forms the backbone of every trading decision. It captures what is happening right now across exchanges: prices, volumes, order flow, funding dynamics, and positioning.
2.1 Price Feeds — CoinGecko API
Ironbrand aggregates real-time price data from 40+ cryptocurrency exchanges through the CoinGecko API. This covers over 18,000 listed assets, providing comprehensive market coverage far beyond the top-cap assets alone.
- Endpoint:
/api/v3/simple/priceand/api/v3/coins/markets - Rate Limit: Free tier — 30 calls per minute
- Data Points: Current price, 24h volume, market cap, 24h change, ATH, ATL
- Authentication: None required (public API)
2.2 OHLCV Candlestick Data — Kraken Futures Charts API
The engine's core price structure analysis depends on high-quality candlestick (OHLCV) data sourced from the Kraken Futures Charts API, a public endpoint that requires no API key.
- Available Resolutions: 1m, 5m, 15m, 30m, 1h, 4h, 12h, 1d, 1w
- Depth: Up to 5,000 candles per request
- Data Points per Candle: Open, High, Low, Close, Volume, Timestamp
- Authentication: None (public endpoint)
Candle data is stored in an in-memory cache with per-resolution TTL values calibrated to match expected data freshness:
- 5-minute candles: 30-second TTL
- 1-hour candles: 300-second TTL (5 minutes)
- Daily / weekly candles: 3,600-second TTL (1 hour)
This tiered caching ensures that fast-moving timeframes are always near-real-time while reducing unnecessary API calls for slower-moving data.
2.3 Funding Rates — Kraken Futures
Perpetual swap funding rates are a critical indicator of market positioning. When funding is deeply positive, longs are paying shorts — signaling crowded bullish positioning and potential mean reversion. The reverse applies for deeply negative funding.
- Symbols Tracked:
PF_XBTUSD,PI_XBTUSD - Rate Conversion: Annualized rate ÷ (365 × 3) = per-8-hour rate
- Cache TTL: 5 minutes (300 seconds)
- Authentication: Public endpoint, no key required
2.4 Open Interest — CoinGlass
Open Interest (OI) measures the total value of outstanding derivative contracts. Rising OI with rising price confirms trend strength; rising OI with falling price suggests aggressive short building.
- Source: CoinGlass aggregated OI (free, no API key)
- Data Points: Total OI, OI change (1h, 4h, 24h), exchange breakdown
- Cache TTL: 10 minutes (600 seconds)
2.5 Liquidation Data — CoinGlass
Liquidation cascades are among the most powerful short-term market forces. A burst of long liquidations accelerates downside moves; short liquidations accelerate upside squeezes.
- Source: CoinGlass liquidation feed (free, no API key)
- Data Points: Long liquidations (USD), Short liquidations (USD), Long/Short ratio
- Analysis: Ratio analysis to detect imbalanced liquidation pressure
- Cache TTL: 5 minutes (300 seconds)
3. On-Chain Data
On-chain data provides a view into the fundamental health of the Bitcoin network — independent of exchange activity. Mempool congestion, fee pressure, and hash rate trends all carry signal that traditional market data cannot capture.
3.1 Bitcoin Mempool — mempool.space API
The Bitcoin mempool is the staging area for unconfirmed transactions. Its size and composition reveal network demand in real time.
- Endpoint:
mempool.space/api/mempool - Data Points: Unconfirmed transaction count, total mempool size (vBytes), fee distribution
- Signal Value: A rapidly filling mempool with rising fee pressure often correlates with periods of high on-chain activity — large transfers, exchange withdrawals, or panic-driven movement
- Authentication: None required
3.2 Fee Estimates — mempool.space
Fee estimates across multiple confirmation targets provide a granular view of network congestion and urgency.
- Endpoint:
mempool.space/api/v1/fees/recommended - Tiers Tracked:
- Fastest: Next-block confirmation
- 30 minutes: ~3 blocks
- 1 hour: ~6 blocks
- Economy: Low-priority, cheapest tier
- Signal Value: Fee spikes (fastest > 100 sat/vB) suggest urgency, often driven by large players moving assets off exchanges
3.3 Network Hash Rate — blockchain.info
Hash rate is the computational power securing the Bitcoin network. A sustained decline can signal miner capitulation; sustained growth signals long-term confidence in the network's economic viability.
- Source: blockchain.info hash rate API
- Timeframe: 30-day rolling trend
- Signal Value: Hash rate divergence from price (price falling, hash rate rising) has historically preceded recoveries, as miners — who have deep operational insight — continue to invest despite bearish price action
3.4 BTC Dominance — CoinGecko Global Endpoint
Bitcoin dominance (BTC market cap as a percentage of total crypto market cap) is a key regime indicator.
- Endpoint:
/api/v3/global - Data Points: BTC dominance percentage, total crypto market cap (USD)
- Cache TTL: 1 hour (3,600 seconds)
- Signal Value: Rising BTC dominance in a falling market = risk-off rotation. Falling BTC dominance in a rising market = altcoin season / risk-on rotation.
5. Geopolitical & News Intelligence
Cryptocurrency markets do not exist in a vacuum. Macro events — rate decisions, geopolitical conflicts, regulatory actions, tariff announcements — can override technical and on-chain signals entirely. Ironbrand's engine explicitly models this external risk layer.
5.1 GDELT Project
The Global Database of Events, Language, and Tone (GDELT) monitors broadcast, print, and web news from nearly every country on Earth, in over 100 languages, and identifies events, themes, and emotional tone in real time.
- Access: Free, rate-limited (1 request per 5 seconds respected)
- Tone Scale: −15 (extremely negative) to +15 (extremely positive)
- Analysis Window: 72-hour rolling window
- Keywords Tracked:
crypto,tariff,fed,geopolitical,bitcoin,regulation,sanctions,war - Cache TTL: 30 minutes (1,800 seconds)
5.2 Polymarket Prediction Markets
Prediction markets aggregate the collective intelligence (and financial conviction) of participants betting real money on future outcomes. They provide probability estimates that are often more accurate than expert forecasts.
- Source: Polymarket API (free, no key required)
- Keywords Monitored:
bitcoin,crypto,recession,tariff,fed,war,sanctions - Risk Scoring: 0–10 scale based on probability × volume weighting
- Cache TTL: 10 minutes (600 seconds)
The engine uses Polymarket data as a forward-looking risk indicator. For example, if a "US Recession in 2026" contract rises from 20% to 55% probability with high volume, this feeds into the macro risk score and adjusts position sizing and signal confidence accordingly.
6. Technical Indicators (Computed Locally)
Unlike the data sources above, technical indicators are computed locally by the engine from raw OHLCV data. No third-party indicator service is used — every calculation is deterministic and auditable.
6.1 Relative Strength Index (RSI)
- Period: 14
- Method: Wilder's smoothing (exponential moving average of gains/losses)
- Overbought Threshold: >70
- Oversold Threshold: <30
- Usage: Momentum filter; signals in overbought/oversold territory receive adjusted confidence scores
6.2 Exponential Moving Averages (EMA)
- EMA Fast: 20-period on 5-minute candles
- EMA Slow: 50-period on 5-minute candles
- EMA Trend: 20-period on 1-hour candles for macro trend confirmation
- Crossover Detection: Golden cross (fast > slow) = bullish; death cross (fast < slow) = bearish
- Multi-timeframe: Alignment between 5m and 1h EMAs strengthens signal conviction
6.3 Average True Range (ATR)
- Period: 14
- Usage: Volatility measurement for dynamic stop-loss calculation
- Stop-Loss Formula: Entry price ± (ATR × multiplier), ensuring stops adapt to current volatility rather than using fixed percentages
6.4 Pivot Points & Break of Structure (BoS)
- Timeframe: 15-minute candles
- Method: Swing high / swing low detection with BoS confirmation
- Usage: Structural trend change detection — a confirmed BoS on the 15m timeframe is one of the strongest entry/exit triggers in the engine
6.5 Volume Analysis
- Volume SMA: 20-period
- Minimum Filter: 0.5x average volume required for signal validation
- Usage: Signals generated on below-average volume receive a confidence penalty; high-volume signals receive a boost. This filters out low-conviction noise.
7. Data Pipeline Architecture
Data flows through the Ironbrand engine in a four-level pipeline, where each level adds structure, context, and intelligence to raw inputs.
Level 1: Data Collection → Validation → Caching
At the base layer, dedicated data collectors poll each source according to its optimal cadence. Incoming data passes through validation checks:
- Schema validation — expected fields present and correctly typed
- Range validation — values fall within plausible bounds
- Timestamp validation — data is not stale beyond its expected freshness
- Deduplication — identical data points are not double-counted
Valid data is written to the in-memory cache with source-specific TTL values. Invalid data is logged and discarded, with an alert if rejection rates exceed thresholds.
Level 2: Signal Engine
The signal engine consumes cached data and computes technical indicators (RSI, EMA, ATR, Pivots) from raw OHLCV candles. It combines these with microstructure data (funding, OI, liquidations) to generate raw signals — directional hypotheses with initial confidence scores.
- Each raw signal includes: direction (long/short), entry price, stop-loss, take-profit, timeframe, confidence score, supporting indicators
- Volume filter applied: signals with volume < 0.5x SMA(20) are penalized
- Multi-timeframe alignment check: 5m signal + 1h trend agreement = confidence boost
Level 3: Context Enrichment
Raw signals are enriched with two context layers:
- Market Context: Fear & Greed, BTC dominance, social sentiment, funding extremes, liquidation imbalances. These modify the signal's confidence score and can flip a marginal signal to "skip."
- Political Context: GDELT tone, Polymarket risk scores, CryptoPanic news sentiment. A deteriorating geopolitical backdrop suppresses long signals and amplifies short signals (and vice versa).
Level 4: LLM Analyst (Multi-Provider AI)
The final filter is a large language model (LLM) analyst that receives the enriched signal package and produces a structured assessment:
- Natural-language reasoning about why the signal makes sense (or doesn't)
- Final confidence score (0–100)
- Risk assessment and position-sizing recommendation
- Specific conditions that would invalidate the thesis
8. Cache Strategy & Data Freshness
Every data source operates on a calibrated cache schedule. The goal is to maintain the freshest possible view while respecting rate limits and minimizing unnecessary API calls.
| Data Source | TTL | Seconds | Fallback Behavior |
|---|---|---|---|
| 5M Candles (OHLCV) | 30s | 30 | Serve stale + retry on next cycle |
| 1H Candles (OHLCV) | 5 min | 300 | Serve stale + retry on next cycle |
| Funding Rate | 5 min | 300 | Use last known value with decay flag |
| Open Interest | 10 min | 600 | Use last known value with staleness warning |
| Liquidations | 5 min | 300 | Zero-fill (assume no liquidations) |
| Fear & Greed Index | 30 min | 1,800 | Use last known value (updates daily) |
| Social Sentiment | 30 min | 1,800 | Use last known value |
| CryptoPanic News | 15 min | 900 | Switch to RSS fallback feed |
| GDELT Tone | 30 min | 1,800 | Use last known value with decay weighting |
| Polymarket | 10 min | 600 | Use last known value |
| BTC Dominance | 1 hour | 3,600 | Use last known value (slow-moving metric) |
staleness_flag attached. Downstream consumers (signal engine,
LLM analyst) can see that the data is older than expected and adjust their
confidence accordingly. This ensures graceful degradation rather than total blindness.
9. Data Quality Controls
Raw data from external sources is inherently noisy, occasionally incorrect, and sometimes missing entirely. The engine implements multiple layers of quality control to ensure that only clean, reliable data reaches the signal generation stage.
9.1 Retry Logic
All API calls are wrapped in a retry mechanism with exponential backoff:
- Attempt 1: Immediate
- Attempt 2: Wait 2 seconds
- Attempt 3: Wait 4 seconds
- Attempt 4 (final): Wait 8 seconds
After all retries are exhausted, the engine falls back to cached data (if available) or the source-specific fallback behavior documented in the cache strategy table above.
9.2 Gap Detection
For time-series data (especially OHLCV candles), the engine checks for missing candles by verifying timestamp continuity. If a gap is detected:
- Gaps of 1–2 candles: forward-fill from last known value with a quality flag
- Gaps of 3+ candles: mark the period as unreliable, suppress signals that depend on continuous data in that window
- All gaps are logged for post-session analysis
9.3 Rate Limit Respect
Each data source has a configured rate limit that the engine strictly respects:
- CoinGecko: 30 requests/minute (free tier)
- GDELT: 1 request per 5 seconds
- mempool.space: Best-effort, no hard limit but polite polling
- CoinGlass: Reasonable usage, no published limit for free tier
Rate limit counters are maintained per-source. If a source approaches its limit, requests are queued and delayed rather than dropped, ensuring no data is missed.
9.4 Multi-Source Cross-Validation
Where the same metric is available from multiple sources, the engine cross-validates:
- Price: CoinGecko aggregated price vs. Kraken direct feed. Divergence >0.5% triggers a warning and uses the aggregated (multi-exchange) value.
- Sentiment: Fear & Greed vs. CoinGecko social sentiment vs. CryptoPanic news sentiment. Agreement strengthens confidence; disagreement triggers a "mixed sentiment" flag.
- Volume: CoinGecko reported volume vs. OHLCV candle volume. Significant discrepancies may indicate wash trading on specific exchanges.
9.5 Anomaly Detection
The engine applies statistical anomaly detection to incoming data points:
- Z-score filter: Data points more than 4 standard deviations from the rolling mean are flagged for review
- Sudden-spike detection: If a metric changes by more than 3x its typical rate of change within a single polling interval, it is held for one additional polling cycle before being accepted
- Timestamp anomalies: Data with future timestamps or timestamps more than 24 hours old is rejected
10. API Reliability & Fallbacks
Ironbrand's engine is designed with no single point of failure. Every critical data path has at least one fallback, and the system degrades gracefully when sources become unavailable.
10.1 All Sources Are Free / Public
A deliberate architectural choice: all primary data sources use free, publicly accessible APIs. This eliminates vendor lock-in, subscription dependencies, and the risk of sudden API key revocation disrupting operations.
| Source | Cost | Auth Required | Fallback |
|---|---|---|---|
| CoinGecko | Free | No | Cached data + stale flag |
| Kraken Futures | Free | No | Cached data + stale flag |
| CoinGlass | Free | No | Cached data + stale flag |
| mempool.space | Free | No | blockchain.info alternative |
| blockchain.info | Free | No | Cached data + stale flag |
| Alternative.me | Free | No | Last known value (daily update) |
| CryptoPanic | Free | API key (free) | RSS feed fallback |
| GDELT | Free | No | Cached data + decay weighting |
| Polymarket | Free | No | Cached data + stale flag |
10.2 Fallback Chains
For mission-critical data paths, the engine implements ordered fallback chains:
- News Sentiment: CryptoPanic API → CryptoPanic RSS → GDELT tone as proxy → neutral assumption
- Price Data: CoinGecko aggregated → Kraken direct → last cached value with staleness warning
- On-Chain Data: mempool.space → blockchain.info → cached value with quality flag
10.3 Monitoring & Alerting
The engine continuously tracks the health of every data source:
- Success rate: Percentage of successful API calls per source over the last 1 hour
- Latency tracking: P50, P95, and P99 response times per source
- Staleness monitoring: How many cached entries are currently being served past their TTL
- Alert thresholds: Source success rate <80% triggers a warning; <50% triggers an escalation; 0% triggers automatic fallback activation
Further Reading
- How the AI Engine Works — Architecture deep dive into the multi-provider LLM analyst
- Market Signals Explained — How raw data becomes actionable trading signals
- Technical Indicators — Detailed documentation of every indicator computed by the engine
- Sentiment Analysis — How sentiment data is scored and integrated
- Geopolitical Analysis — GDELT, Polymarket, and macro risk modeling
- On-Chain Signals — Bitcoin network fundamentals as trading indicators