6 min read

Where is the Binance order book data?

Binance gives away years of historical trade and kline data for free. Aggregate trades going back to 2017, klines at every interval imaginable, individual fills for every listed pair — all sitting in tidy CSV files on data.binance.vision. Go ahead, download BTCUSDT trades from three years ago. It’ll take you five minutes.

Data TypeAvailableHistory DepthFormat
Aggregate TradesSince 2017CSV (daily/monthly)
Individual TradesSince 2017CSV (daily/monthly)
Klines (Candlesticks)Since 2017CSV (1s to 1M intervals)
Order Book Snapshots
Incremental Depth
Book Ticker

Now try to find order book data. You’ll click around, check the directory listings, search the docs. Nothing. No snapshots, no incremental updates, no partial depth, nothing. The limit order book — the single data structure where price discovery actually happens — is completely absent from Binance’s public data archive.

This article explains what’s available, what’s missing, why it’s missing, and what your options are.

What data.binance.vision actually provides

Binance maintains a public data repository at data.binance.vision with free historical downloads for spot and futures markets. The coverage is good for what it includes:

  • Aggregate trades — every fill, going back to 2017, in daily or monthly CSV files
  • Individual trades — the raw trade stream, also going back to 2017
  • Klines (candlesticks) — 1-second through 1-month intervals, for every listed pair

For a lot of use cases — backtesting strategies that rely on price and volume, computing VWAP, analyzing trade arrival rates — this data is enough.

What’s missing: the interesting part

Trades tell you what happened. The order book tells you what was about to happen.

Trades are the result of order book dynamics, but the book itself — the bid/ask queue, the depth at each level, how liquidity shifts before and after large trades — contains information that trade data alone can’t reconstruct.

If you’re doing any of the following, you need depth data:

  • Market microstructure research — studying bid-ask spread behavior, queue position dynamics, or information asymmetry between passive and aggressive order flow
  • Limit order book modeling — training sequence models (LSTM, Transformer, DeepLOB) on book state evolution
  • Execution quality analysis — estimating slippage for large orders by looking at depth ahead of your price level
  • Market making strategy development — modeling adverse selection risk by watching how the book reshapes around your resting orders
  • Spoofing and manipulation detection — identifying patterns in order placement and cancellation that trade data can’t reveal

None of this is possible with trades and klines alone. You’re studying the shadows on the wall when the actual thing is happening in the order book.

Why Binance doesn’t include order book data

It’s not an oversight. And it’s not laziness — Binance clearly invested effort in their public data archive. There are engineering reasons order book data is absent.

It’s massive. A single instrument’s depthUpdate stream on Binance generates hundreds of thousands of messages per day. Across over 1,500 spot trading pairs, the aggregate volume is enormous. Distributing this at scale as static file downloads would be expensive and unwieldy.

It’s complex to store meaningfully. Trade data is self-contained — each row is a complete event. Order book data is stateful. An incremental depthUpdate message only makes sense relative to a prior snapshot. Providing useful order book downloads would mean distributing synchronized snapshots and update streams together, with clear documentation about how to reconstruct book state. That’s a much harder distribution problem than dropping CSV files into an S3 bucket.

The format evolves. Binance’s WebSocket API for depth data has changed over time — field names, update frequencies, and snapshot endpoints have all been modified. Maintaining a multi-year archive across format changes adds another layer of complexity that nobody at Binance is eager to take on.

So the free data repository covers what’s straightforward to distribute (trades and klines) and skips what isn’t (order books). It’s a reasonable engineering decision. But it means that the most information-rich data Binance produces — the full limit order book — is effectively ephemeral. If you weren’t recording it when it happened, it’s gone.

What Binance order book streams look like

Before getting into options for recording this data, it helps to know what we’re talking about. Binance exposes several depth-related WebSocket streams:

depthUpdate (incremental updates) — streams every change to the order book as it happens. Each message contains arrays of bid and ask price levels that changed:

{
  "e": "depthUpdate",
  "E": 1708012800123,
  "s": "BTCUSDT",
  "U": 45828374661,
  "u": 45828374665,
  "b": [
    ["51234.10", "1.234"],
    ["51233.90", "0.500"],
    ["51230.00", "0.000"]
  ],
  "a": [
    ["51234.20", "0.800"],
    ["51235.00", "2.100"]
  ]
}

A quantity of "0.000" means that price level was removed from the book. The U and u fields are update IDs that let you verify you haven’t missed any messages.

depth5, depth10, depth20 (partial book snapshots) — push the top N levels of the book at regular intervals (every 100ms or 1000ms). Useful if you don’t need full book depth but want a periodic view of the best bids and asks.

These streams are freely available through the Binance WebSocket API. The data itself costs nothing. The challenge is capturing it continuously.

Your options for getting order book data

Option 1: record it yourself (DIY)

Connect to the Binance WebSocket API, subscribe to the depth streams you need, and write the messages to disk or a database. The data is free, and Binance’s API documentation is decent.

The catch is that “connect and write to disk” is the easy part — about a day of work. The hard part is keeping it running:

  • 24/7 uptime — markets don’t close, and gaps in your data are gaps in your research
  • Connection management — Binance WebSocket connections drop periodically (maintenance windows, rate limits, network blips), and you need automatic reconnection with proper snapshot re-synchronization
  • Storage — depth data for 25 instruments accumulates fast; plan for tens of gigabytes per month depending on the streams and instruments you choose
  • Gap monitoring — you need to know when you’ve lost messages, not discover it weeks later when your backtest produces weird results

For 25 concurrent streams on AWS, the infrastructure alone runs $50-100/month — compute, storage, bandwidth, and monitoring. That’s before you write a single line of recording code. For 100 streams, expect $160-360/month.

If you enjoy infrastructure work and want full control over every aspect of the pipeline, DIY is a legitimate choice. Plenty of quant teams run their own recorders well. The cost is your time and a monthly cloud bill.

Option 2: Cryptofeed (open source)

Cryptofeed is a well-known open-source Python library that connects to nearly 40 exchange WebSocket APIs and normalizes the data into a common format. It supports multiple storage backends (Postgres, Redis, InfluxDB, Arctic, and others) and handles connection management.

It’s a great starting point. But getting from “Cryptofeed is running on my laptop” to “I have a production-grade recording system with gap detection and reliable export” takes real effort. In our experience, teams typically spend 2-4 months building out monitoring, alerting, storage management, and data validation on top of the core library. Then there’s ongoing maintenance — exchange API changes, dependency updates, and the occasional 3 AM reconnection failure that needs investigation.

If you’re comfortable with that investment and want multi-exchange support (Cryptofeed covers nearly 40 exchanges), it’s a strong option.

Option 3: Tardis.dev (historical archive)

Tardis.dev is one of the most established providers in crypto tick data. They maintain a historical archive covering 40+ exchanges with data going back to 2019, including full order book depth. Their tooling is mature — Python and Node.js clients, a replay API that lets you simulate real-time data feeds, and normalized data formats alongside raw exchange messages.

Subscriptions start at $700/month for solo users (perpetual swaps only) and scale up from there depending on exchange coverage and tier. If you need years of multi-exchange order book data — say, BTCUSDT depth from 2020 across Binance, Coinbase, and Kraken — Tardis is purpose-built for that.

The trade-off is cost. For a researcher who needs ongoing Binance-only recording and doesn’t need years of history, $700/month is a steep entry point.

Option 4: Crypto Lake (historical archive)

Crypto Lake delivers historical crypto data in Parquet format through their Python library lakeapi. They cover 10 exchanges and 100+ pairs, with data types including order book snapshots, incremental depth updates, trades, and funding rates.

Their order book data comes in several flavors: 20-level snapshots at 100ms frequency (book), full-depth incremental updates with 1,000+ levels (book_delta_v2), and 1-minute deep snapshots (deep_book_1m). Individual pricing starts at $80/month (currently 20% off for new subscribers).

The trade-off: Binance order book history only goes back to late 2022, and at $80/month for an individual plan, it’s more expensive than Ticksupply for Binance-only recording.

Option 5: Ticksupply (managed recording)

Ticksupply takes a different approach. Instead of selling a pre-existing archive, it records Binance WebSocket streams 24/7 from the moment you subscribe and exports date ranges as gzipped CSV.

The export format is two columns: a nanosecond receipt timestamp and the raw exchange JSON message, exactly as Binance sent it. No normalization, no truncation, no reshaping. You get the full depthUpdate, depth5, depth10, depth20, or any other Binance stream type, in its original format.

Plans start at $25/month for 25 concurrent streams and 100 GiB of exports. The Professional plan is $100/month for 100 concurrent streams and 500 GiB. Currently covers Binance and Bybit (spot and linear perpetuals).

The trade-off is real and worth stating upfront: Ticksupply has no pre-existing history. Your data starts from the day you subscribe. If you need BTCUSDT depth data from last year, this isn’t the right tool. If you want to start building a Binance order book dataset going forward without managing infrastructure, that’s what it does.

DIYCryptofeedTardis.devCrypto LakeTicksupply
Monthly cost$50–100/mo$50–100/mo infra$700/mo+$80/mo$25/mo
Setup time2–8 weeks3–6 weeksMinutesMinutesMinutes
History depthFrom day oneFrom day oneSince 2019Since 2022From subscribe date
Data formatYour choiceConfigurableNormalized + rawParquet via APIRaw exchange JSON
ExchangesAny~40 exchanges40+ exchanges10 exchangesBinance, Bybit

How to decide

The right choice depends on what you actually need:

You need years of multi-exchange history. Tardis.dev. Their archive is deep and their tooling is mature. Subscriptions start at $700/month, but you’re accessing 7+ years of tick data across 40+ exchanges.

You want Parquet data with DataFrame integration. Crypto Lake. Their lakeapi library is well-suited to Python-heavy research workflows, and they offer both snapshots and full-depth incremental updates.

You want full control and enjoy infrastructure work. DIY with Cryptofeed. You’ll spend time on it, but you’ll own every piece of the pipeline and can record from any exchange Cryptofeed supports.

You need raw, ongoing Binance order book recording at low cost. Ticksupply. $25/month is less than the cloud infrastructure cost of running your own recorder, and you get nanosecond-timestamped raw exchange messages without maintaining anything.

You only need trades and klines. data.binance.vision. It’s free, it goes back years, and there’s no reason to pay for what Binance already gives away.


Start recording Binance order book data today

$25/month for 25 streams. 7-day free trial. Set up in minutes, no infrastructure to manage.

Try Ticksupply free