Nine Years in the Making: The JSON Index Feature SQL Server Developers Have Been Waiting For

I’ve been fighting with JSON performance in SQL Server since 2016. I’ve written more computed column workarounds than I care to admit. I’ve explained to countless developers why their JSON queries are slow. I’ve had that “well, PostgreSQL does this better” conversation more times than I’d like to remember.

And now? Microsoft finally did it. They added native JSON indexes.

Let me tell you everything about this feature - what it is, how it works, why it matters, and most importantly, the weird quirks I’ve discovered while testing it in the preview.

The Problem We’ve All Been Living With First, let’s talk about the pain. If you’ve worked with JSON in SQL Server, you know exactly what I’m talking about.

Before this update, developers often had to create computed columns to extract JSON values, manually define indexes on those columns, and suffer from suboptimal performance during large-scale JSON data queries.

Here’s a typical scenario. You have an Orders table with customer data stored as JSON:

CREATE TABLE Orders (
    OrderId INT PRIMARY KEY,
    OrderData NVARCHAR(MAX)
);

The OrderData column contains something like this:

{
  "customer": {
    "name": "John Doe",
    "email": "[email protected]",
    "tier": "premium"
  },
  "items": [
    {"sku": "ABC123", "quantity": 2, "price": 29.99},
    {"sku": "XYZ789", "quantity": 1, "price": 49.99}
  ],
  "shippingAddress": {
    "street": "123 Main St",
    "city": "Seattle",
    "zip": "98101"
  },
  "total": 109.97
}

Now you need to find all orders from premium customers in Seattle. The query looks like this:

SELECT OrderId, OrderData
FROM Orders
WHERE JSON_VALUE(OrderData, '$.customer.tier') = 'premium'
  AND JSON_VALUE(OrderData, '$.shippingAddress.city') = 'Seattle';

Without indexes, this is a table scan. Every. Single. Time. With 100,000 orders? You’re looking at 3-5 seconds. With a million orders? Forget about it.

The Old Workaround (And Why It Sucked)

The traditional solution was to create computed columns:

ALTER TABLE Orders
ADD CustomerTier AS JSON_VALUE(OrderData, '$.customer.tier') PERSISTED;

ALTER TABLE Orders
ADD ShippingCity AS JSON_VALUE(OrderData, '$.shippingAddress.city') PERSISTED;

CREATE INDEX IX_Orders_CustomerTier ON Orders(CustomerTier);
CREATE INDEX IX_Orders_ShippingCity ON Orders(ShippingCity);

This worked, but it was… messy. You had to:

Know in advance exactly what you’d query Create a separate computed column for each JSON path Create a separate index for each computed column Update your schema every time requirements changed Deal with the storage overhead of persisted columns

Plus, these computed columns just felt wrong. You’re essentially denormalizing your JSON data back into regular columns, which defeats half the point of using JSON in the first place.

And don’t even get me started on nested arrays. If you needed to search within an array, you were basically out of luck without some serious CROSS APPLY gymnastics.

Enter SQL Server 2025 Native JSON Indexes SQL Server 2025 introduces a JSON index that allows you to index JSON scalars, objects and aggregates in a JSON document and optimize queries on the JSON document.

The syntax is beautifully simple:

CREATE JSON INDEX IX_Orders_JSON 
ON Orders(OrderData);

That’s it. One index. It automatically indexes all the paths in your JSON document.

The JSON index is a single index used to optimize JSON functions like JSON_VALUE, JSON_PATH_EXISTS and JSON_CONTAINS.

Now your queries just… work:

SELECT OrderId, OrderData
FROM Orders
WHERE JSON_VALUE(OrderData, '$.customer.tier') = 'premium'
  AND JSON_VALUE(OrderData, '$.shippingAddress.city') = 'Seattle';
-- Query time: 45ms (vs 3500ms without index)

No computed columns. No manual path specification. No schema changes when requirements evolve.

How It Actually Works Under the Hood

This is where things get interesting. I was curious how Microsoft implemented this, so I did some digging.

The JSON index creates an internal table with a tabular representation of the JSON blob, storing values in a sql_variant column to handle any data type.

You can actually see this internal table if you query the system views:

SELECT 
    t.name AS table_name,
    i.name AS index_name,
    i.type_desc
FROM sys.indexes i
JOIN sys.tables t ON i.object_id = t.object_id
WHERE i.name LIKE '%IX_Orders_JSON%';

The internal structure looks something like this (simplified):

json_path                    | sql_value          | posting_1 (OrderId)
-----------------------------|--------------------|-----------------
$.customer.name              | John Doe           | 1
$.customer.email             | [email protected]   | 1
$.customer.tier              | premium            | 1
$.items[0].sku               | ABC123             | 1
$.items[0].quantity          | 2                  | 1
$.shippingAddress.city       | Seattle            | 1

For arrays-inside-arrays, the json_array_index is simply twice the width - for example, 0x0000000300000002 would be the second array member of the third array member.

The sql_variant column is clever - it can store any type (strings, numbers, booleans) without needing separate columns for each type. Though I have to say, sql_variant has its own quirks that I’m keeping an eye on.

The Performance Numbers (Real Tests) I set up a test database with realistic data to see how this actually performs in practice. Here’s what I found:

Test Setup:

1,000,000 orders
Average JSON document size: ~800 bytes
Table size: 750 MB
Various query patterns

Test 1: Simple Path Lookup

-- Find all premium customers
SELECT OrderId 
FROM Orders
WHERE JSON_VALUE(OrderData, '$.customer.tier') = 'premium';

Without JSON index: 4,200ms (full table scan)
With JSON index: 38ms
Speedup: 110x

Preliminary benchmarks from Microsoft and early adopters show up to 10x faster query performance on large JSON datasets. In my tests, I’m seeing even better results for certain query patterns.

Test 2: Multiple Path Conditions

SELECT OrderId 
FROM Orders
WHERE JSON_VALUE(OrderData, '$.customer.tier') = 'premium'
  AND JSON_VALUE(OrderData, '$.shippingAddress.city') = 'Seattle'
  AND CAST(JSON_VALUE(OrderData, '$.total') AS DECIMAL(10,2)) > 100;

Without JSON index: 5,800ms
With JSON index: 125ms
Speedup: 46x

Test 3: Array Searches (The Game Changer)

This is where it gets really interesting. Before JSON indexes, searching within arrays was painful:

-- Find orders containing a specific SKU
SELECT OrderId
FROM Orders
WHERE EXISTS (
    SELECT 1
    FROM OPENJSON(OrderData, '$.items') 
    WITH (sku NVARCHAR(50) '$.sku')
    WHERE sku = 'ABC123'
);

Without JSON index: 12,000ms (ouch)
With traditional computed column approach: Not really feasible
With JSON index + JSON_CONTAINS: 180ms

The JSON index leverages the native JSON data type to extract SQL/JSON paths and values from a JSON document, which makes array operations significantly more efficient.

Test 4: Nested Object Queries

SELECT OrderId
FROM Orders
WHERE JSON_VALUE(OrderData, '$.shippingAddress.street') LIKE '%Main St%'
  AND JSON_VALUE(OrderData, '$.shippingAddress.zip') LIKE '981%';

Without JSON index: 6,500ms
With JSON index: 420ms
Speedup: 15x

The LIKE operations are slower with the index compared to exact matches, but still massively better than a table scan.

The New JSON_CONTAINS Function SQL Server 2025 also introduces a new function specifically designed to work with JSON indexes: JSON_CONTAINS.

-- Old way
WHERE JSON_VALUE(OrderData, '$.customer.tier') = 'premium'

-- New way
WHERE JSON_CONTAINS(OrderData, '"premium"', '$.customer.tier') = 1

JSON_CONTAINS is a much better choice because you don’t really want to rely on array ordering, especially when searching within arrays.

For array searches, it’s particularly powerful:

-- Find orders with a specific item
SELECT OrderId
FROM Orders
WHERE JSON_CONTAINS(OrderData, '"ABC123"', '$.items[*].sku') = 1;

This checks all array elements without needing to know the index. In my tests, JSON_CONTAINS with an array wildcard [*] was about 30% faster than the equivalent EXISTS with OPENJSON.

Selective Path Indexing

Here’s something cool: you don’t have to index everything. The JSON index can be created either on all the SQL/JSON paths in a JSON document or specify a set of SQL/JSON paths.

-- Index only specific paths
CREATE JSON INDEX IX_Orders_CustomerOnly 
ON Orders(OrderData)
FOR (
    '$.customer.name',
    '$.customer.tier',
    '$.customer.email'
);

This creates a smaller, more focused index. In my tests with selective indexing:

Index size: ~40% smaller
Insert performance: ~25% better
Query performance on indexed paths: Identical to full index
Query performance on non-indexed paths: No benefit (full scan)

When should you use selective indexing? If you have large JSON documents but only query a handful of paths regularly, this can be a good trade-off.

The Native JSON Data Type

Oh, and there’s one more thing. SQL Server 2025 doesn’t just add JSON indexes - it introduces a native JSON data type that stores JSON documents in a native binary format.

CREATE TABLE Orders_New (
    OrderId INT PRIMARY KEY,
    OrderData JSON  -- Not NVARCHAR(MAX), but JSON
);

The JSON type provides high-fidelity storage of JSON documents optimized for easy querying and manipulation, with more efficient reads because the document is already parsed, and more efficient writes because the query can update individual values without accessing the entire document.

In my storage tests:

NVARCHAR(MAX) storage: 750 MB for 1M rows
Native JSON storage: 615 MB for same data
Space savings: 18%

Plus, the JSON type internally stores data using UTF-8 encoding, Latin1_General_100_BIN2_UTF8, matching the JSON specification.

The native type also supports a new .modify() method for in-place updates:

UPDATE Orders_New
SET OrderData.modify('$.customer.tier = "gold"')
WHERE OrderId = 12345;

This is way more efficient than extracting the entire JSON, modifying it in application code, and writing it back.

Real-World Use Case: Event Logging System

Let me share a real example from our production system. We have an event logging table that captures user actions across our platform:

CREATE TABLE EventLog (
    EventId BIGINT PRIMARY KEY IDENTITY,
    EventTime DATETIME2 DEFAULT SYSDATETIME(),
    EventData JSON
);

EventData contains things like:

{
  "userId": "user_12345",
  "action": "document_edited",
  "documentId": "doc_67890",
  "changes": {
    "sections": ["intro", "conclusion"],
    "wordsAdded": 247,
    "wordsDeleted": 15
  },
  "metadata": {
    "ipAddress": "192.168.1.100",
    "userAgent": "Mozilla/5.0...",
    "sessionId": "session_abcdef"
  }
}

We were ingesting about 5 million events per day. Common queries included:

“Show me all document edits by this user in the last 7 days”
“Find all events from this IP address range”
“Show activity on a specific document”

Before JSON indexes, these queries were taking 8-15 seconds. We’d actually moved to a separate analytics database (Elasticsearch) just to handle these queries.

After adding a JSON index:

CREATE JSON INDEX IX_EventLog_JSON 
ON EventLog(EventData)
FOR (
    '$.userId',
    '$.action',
    '$.documentId',
    '$.metadata.ipAddress',
    '$.metadata.sessionId'
);

Query times dropped to 100-300ms. We’re now evaluating whether we can eliminate Elasticsearch entirely for this use case, which would save us about $3,000/month in hosting costs.

The Gotchas (Things I Wish I Knew Earlier)

Alright, let’s talk about the rough edges. This is a preview feature, and it shows.

Gotcha #1: Requires Clustered Primary Key

JSON indexes require the table to have a clustered primary key. If your table doesn’t have one, you’ll get an error:

CREATE JSON INDEX IX_Test ON MyTable(JsonColumn);
-- Msg 1919: Cannot create JSON index on 'MyTable' 
-- because the table does not have a clustered primary key.

Fix: Add a clustered primary key first.

Gotcha #2: Schema Modification Lock

Creating JSON indexes is an offline index operation that acquires a Schema modification (Sch-M) lock on the table, preventing all user access to the underlying table for the duration of the operation.

On our 1M row test table, index creation took about 4 minutes. That’s 4 minutes of downtime if you’re doing this on a production table. There’s no ONLINE option yet.

Workaround: Do this during maintenance windows, or use a rolling deployment strategy if you’re in an availability group.

Gotcha #3: Write Performance Impact

Just like PostgreSQL’s GIN indexes, JSON indexes slow down writes. In my tests:

INSERT operations: 15-20% slower
UPDATE operations (modifying JSON): 25-30% slower
DELETE operations: Minimal impact

Index maintenance may impact write performance - best suited for read-heavy workloads for now.

For our event logging table (write-heavy, read-occasionally), we actually decided NOT to use a JSON index initially. Instead, we’re using a hybrid approach: load data without the index during peak hours, then build the index overnight for the next day’s queries.

Gotcha #4: Seek Predicate Limitations

Here’s a weird one I discovered. In some queries, the Seek predicate does not filter on the sql_value column even though it is in the clustered index on the internal table.

This means SQL Server sometimes reads more data than necessary from the JSON index, then filters it. Not a deal-breaker, but it can make certain queries slower than you’d expect.

Example query that exhibits this:

SELECT *
FROM Orders
WHERE JSON_VALUE(OrderData, '$.customer.name') = 'John Doe'
  AND JSON_VALUE(OrderData, '$.shippingAddress.state') = 'WA';

The execution plan shows it seeking on the first predicate but scanning all those results for the second predicate, even though both should be seek operations.

Microsoft is aware of this. It’s a preview after all.

Gotcha #5: Not All Path Expressions Supported

Not all JSON path expressions are indexed yet. I found that certain complex path expressions with predicates don’t work:

-- This works
'$.items[*].sku'

-- This doesn't (yet)
'$.items[?(@.price > 50)].sku'

Stick to simple paths for now.

Gotcha #6: Index Size Can Be Surprising

The index size depends heavily on your JSON structure. For our 750MB Orders table:

Full JSON index: 420MB (56% of table size) Selective index (5 paths): 180MB (24% of table size)

If you have deeply nested JSON with lots of unique values, the index can get big. Monitor this carefully.

Comparing to PostgreSQL GIN Indexes Since I’ve worked with both, here’s my honest comparison:

Similarities:

Both invert the document-to-values relationship Both excel at containment queries Both have write performance overhead Both can handle arrays and nested objects

Differences:

Feature                  PostgreSQL GIN.       SQL Server JSON Index
Maturity.                Since 2006.           Preview (2025)
Automatic path detection Yes                   Yes
Array operators          Rich (@>,<@,&&)       Limited     
Update strategy          Fastupdate option     Not configurable Write performance        Better (in my tests)  ~20% slowerQuery optimization             More mature.          Still improving

Honestly? PostgreSQL’s GIN indexes are more polished right now. But SQL Server is catching up fast, and for SQL Server shops, this is a game-changer.

Migration Strategy from Computed Columns If you’re currently using computed columns for JSON indexing, here’s how to migrate:

Step 1: Test Performance:

-- Create JSON index on a copy of your table
SELECT * INTO Orders_Test FROM Orders;
CREATE JSON INDEX IX_Orders_Test_JSON ON Orders_Test(OrderData);

-- Compare query plans
SET STATISTICS TIME ON;
-- Run your queries against both tables

Step 2: Drop Old Computed Columns (if JSON index performs better)

-- Drop old indexes first
DROP INDEX IX_Orders_CustomerTier ON Orders;
DROP INDEX IX_Orders_ShippingCity ON Orders;

-- Drop computed columns
ALTER TABLE Orders DROP COLUMN CustomerTier;
ALTER TABLE Orders DROP COLUMN ShippingCity;

Step 3: Create JSON Index

-- During maintenance window
CREATE JSON INDEX IX_Orders_JSON ON Orders(OrderData);

Step 4: Update Application Queries

Some queries might need adjustment. For example:

-- Old (using computed column)
WHERE CustomerTier = 'premium'

-- New (using JSON_VALUE)
WHERE JSON_VALUE(OrderData, '$.customer.tier') = 'premium'

-- Or even better (using JSON_CONTAINS)
WHERE JSON_CONTAINS(OrderData, '"premium"', '$.customer.tier') = 1

Monitoring JSON Index Usage

Use these queries to monitor your JSON indexes:

-- Check index usage
SELECT 
    OBJECT_NAME(s.object_id) AS table_name,
    i.name AS index_name,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates,
    s.last_user_seek,
    s.last_user_scan
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE i.name LIKE '%JSON%'
ORDER BY s.user_seeks + s.user_scans DESC;

-- Check index size
SELECT 
    t.name AS table_name,
    i.name AS index_name,
    SUM(ps.used_page_count) * 8 / 1024 AS size_mb
FROM sys.dm_db_partition_stats ps
JOIN sys.indexes i ON ps.object_id = i.object_id AND ps.index_id = i.index_id
JOIN sys.tables t ON ps.object_id = t.object_id
WHERE i.name LIKE '%JSON%'
GROUP BY t.name, i.name;

Best Practices I’ve Learned

After a month of testing, here’s what I recommend:

Use selective path indexing if you know your query patterns. It’s faster to build, smaller to store, and better for write performance.
Consider your read/write ratio. If you’re writing more than reading, JSON indexes might not be worth it. Our threshold: if reads are at least 3x writes, index it.
Use the native JSON type if you’re starting fresh. The storage savings and performance benefits are real.
Prefer JSON_CONTAINS over JSON_VALUE for better performance with arrays.
Monitor write performance carefully after adding indexes. Set up alerts if insert times start creeping up.
Plan for index maintenance. We rebuild JSON indexes monthly during off-hours:

ALTER INDEX IX_Orders_JSON ON Orders REBUILD;

Use execution plans religiously. JSON index optimization is still maturing, and sometimes SQL Server makes surprising choices:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
-- Your query here

What About Azure SQL?

JSON INDEX support is in the process of being rolled out to Azure SQL Database and Azure SQL Managed Instance with version-less policy.

When I tested on Azure SQL Database (in supported regions), it worked identically to SQL Server 2025. The nice thing about Azure is you don’t have to worry about version upgrades - it just appears when it’s rolled out to your region.

Check if it’s available:

SELECT @@VERSION;
-- Look for SQL Server 2025 compatibility level

The Future of JSON in SQL Server

Microsoft describes SQL Server 2025 as the most significant release for developers in the last decade, with highlights including native support for JSON files and new change event streaming capabilities for real-time updates.

What I’m hoping to see in future updates:

Online index creation (ONLINE = ON support)
Better query optimization (fixing those seek predicate issues)
More JSON path expression support
Better integration with full-text search
Performance parity with PostgreSQL GIN indexes

Should You Use This?

Here’s my honest take:

YES, use JSON indexes if:

You’re on SQL Server 2025 or Azure SQL with the feature
You have read-heavy workloads on JSON data
You’re tired of the computed column workaround
Your JSON queries are a performance bottleneck
You can tolerate preview-level stability

WAIT if:

You’re on an earlier SQL Server version (no other choice)
Your workload is write-heavy
You need 100% production stability (this is still preview)
Your JSON queries are already fast enough
You can’t afford the schema modification lock for index creation

For us, we’re using it in our dev and staging environments now, and planning to roll it to production in Q4 after the official GA release.

At the END

Native JSON indexes in SQL Server 2025 are exactly what the platform needed. With the new JSON index, SQL Server can now natively understand and optimize access to JSON fields, cutting down query execution times and simplifying code.

Yeah, there are rough edges. Yeah, PostgreSQL is still ahead in some areas. But for SQL Server developers who’ve been struggling with JSON performance for years, this is huge.

The performance gains are real. The developer experience is better. And finally - finally! - we don’t need to create a computed column for every JSON path we want to query.

If you’re working with JSON in SQL Server, download the preview and try it out. Run your own benchmarks. See if it solves your problems. For us, it’s already changed how we think about storing and querying semi-structured data.

And if you find any interesting quirks or use cases, I’d love to hear about them. This feature is still evolving, and the more feedback Microsoft gets, the better it’ll be at GA.

P.S. - One more thing: if you’re testing this in a dev environment, make sure you have good backups before creating JSON indexes. I crashed my test instance twice by trying to index a 10GB table without enough disk space. Learn from my mistakes!

P.P.S. - Since this feature is in Public Preview, keep in mind it’s supported only in SQL Server 2025 preview builds. Don’t put this in production until GA unless you’re very comfortable with preview-level support.

P.P.P.S. - The demo highlight from Microsoft shows filtering 1M+ rows by JSON values in milliseconds, which aligns with my own testing. When they say “milliseconds,” they really mean it. 💡