SQL Server 2025 Regex Now Supports 2MB LOB Data
If you’ve ever tried to sanitize a massive HTML payload or extract IPs from a gigabyte log file directly in T-SQL, you know the pain. You usually end up writing fragile application-tier logic or dealing with clunky SQLCLR assemblies. We've been watching Microsoft's database updates closely, and the latest release addresses this massive headache. With the rollout of SQL Server 2025 CU5 and recent Azure SQL updates, native regular expressions have finally leveled up.
News Summary
Microsoft has officially expanded native regex capabilities in T-SQL to handle Large Object (LOB) inputs. This update ships in SQL Server 2025 CU5 and is already live in Azure SQL Database and Fabric.
Previously, native regex was capped, forcing you to split log files, HTML documents, or large JSON payloads into 8,000-byte chunks just to run a simple pattern match.
Now, all seven regex functions accept varchar(max) and nvarchar(max) data natively. The engine allows up to 2 MB of text per single function call.
This includes standard scalar functions like REGEXP_REPLACE and REGEXP_SUBSTR, as well as the heavy-hitting table-valued functions (TVFs) like REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE.
To power this securely without crashing your database, T-SQL regex relies on the RE2 engine. RE2 guarantees linear-time execution and prevents catastrophic backtracking-also known as ReDoS attacks.
That protection is absolutely crucial when you are feeding the engine a 1.8 MB log blob rather than a standard 8,000-byte string.
If you run Azure SQL Managed Instance on the Always-up-to-date policy, this feature is rolling out region by region right now. Instances on the SQL Server 2025 update policy receive it as part of the CU5 rollout.
Keep in mind, your regex pattern itself is still capped at 8,000 bytes, which is far larger than any maintainable regular expression should ever need.
If you want to use the powerful TVFs, ensure your database compatibility level is set to 170.
What This Means for Developers
This update drastically simplifies your data pipeline architecture. If you're building a SaaS that ingests massive webhook payloads, server logs, or raw HTML, you can now parse, sanitize, and extract structured data without leaving the database engine.
Need to redact PII like credit cards or Social Security numbers across a multi-megabyte payload? It’s now a single REGEXP_REPLACE expression. No more custom chunked-replace routines. No more pulling raw text over the wire to your Node.js or Python backend just to run a regex pass.
By processing this directly in T-SQL, you eliminate significant network latency and reduce the memory footprint of your application tier. You can shred a massive text blob into a staging table purely using set-based operations, projecting the entire log payload as one row per line natively.
Our Analysis
This is an absolute win for the data engineering and backend dev community. For over a decade, T-SQL's string manipulation was notoriously weak compared to PostgreSQL, which has boasted robust native regex for years.
Before CU5, LOB inputs were limited to just REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. Deferring LOB support on replacement and splitting functions meant the feature felt half-baked at launch, heavily restricting what developers could actually build.
Now that CU5 completes the picture, Azure SQL and SQL Server 2025 are finally on par with modern developer expectations.
We predict this will lead to a massive deprecation of legacy SQLCLR assemblies across enterprise codebases. SQLCLR was always a security and maintenance headache, and native, RE2-backed regex completely removes the historical need for it.
Furthermore, we expect developers to push this 2 MB ceiling quickly. While 2 MB per function call is generous for standard web logs and HTML, AI agents and massive document processing pipelines easily exceed this. Microsoft is already asking for feedback on payloads exceeding 2 MB, signaling that they know larger chunking is the next frontier.
Comparing this to app-tier processing, the performance gains are undeniable. Extracting JSON capture groups directly via REGEXP_MATCHES means your API can simply query structured columns instead of downloading a blob to parse in memory.
Feature Comparison
| Capability | Before CU5 | SQL Server 2025 CU5 & Azure SQL |
| LOB inputs for TVFs | Not Supported | Supported (<code>varchar(max)</code>, <code>nvarchar(max)</code>) |
| Input limit per function call | 8,000 bytes (for 4 out of 7 functions) | 2 MB across all 7 functions |
| Regex Pattern Size Limit | 8,000 bytes | 8,000 bytes |
| Compatibility level required for TVFs | 170 | 170 |
Code Snippet: PII Redaction in T-SQL
If you want to sanitize massive logs directly in the engine, here is how you redact U.S. SSNs, credit cards, and emails from a LOB column using the new capabilities.
SQL
-- Redact sensitive data from a varchar(max) payload directly in T-SQL
SELECT LogId,
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
RawPayload,
'\b[0-9]{4}[-]?[0-9]{4}[-]?[0-9]{4}[-]?[0-9]{4}\b', -- Match CC
'****_****_****_****'
),
'\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b', -- Match SSN
'***-**-****'
),
'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b', -- Match Email
'[redacted-email]'
) AS RedactedPayload
FROM dbo.LogEntries;
FAQs
Q: Does the 2 MB limit apply to the entire column?
A: No, the 2 MB limit only applies to the input passed to a single regex function call, not the column itself. A single value in a varchar(max) column can still store up to 2 GB.
Q: Will this cause ReDoS (Regular Expression Denial of Service) timeouts?
A: No. T-SQL regex is powered by the RE2 engine, which uses a linear-time, non-backtracking implementation specifically designed to prevent catastrophic backtracking.
Q: Do I need to update my database compatibility level to use this?
A: You only need database compatibility level 170 if you are using the table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE). The scalar regex functions are available at all compatibility levels.
Q: Can I use this in Azure SQL Database right now?
A: Yes, this capability is already available in Azure SQL Database and SQL Database in Fabric.
Our Take
Microsoft bringing full 2MB LOB support to all seven regex functions is exactly the kind of foundational database update developers need. It immediately kills the need for awkward 8,000-byte string chunking and keeps heavy text-processing close to the metal where it belongs. We expect teams to aggressively refactor their legacy SQLCLR jobs out of existence in favor of this native approach. At Devignitor, we're watching closely to see how quickly developers push that 2MB ceiling as database-level AI and vector operations become the new standard