In a previous post I presented some general-purpose wrappers for adding regular expressions functionality to VBA and SQL in Microsoft Access.
These wrappers work well, but we can make them much more efficient for situations where the same pattern is used many times, such as in a SQL query where the regular expression will be executed on each row of the input table or query. This article describes how to use a simple factory pattern to cache and reuse RegExp objects, dramatically increasing performance when the same pattern is used more than once during the lifetime of a program.
Caching Compiled Regular Expressions
What we’ll use is the fact that, behind the scenes, every regular expression pattern gets compiled into a tiny little program that executes in order to actually apply the regular expression. The first time you call any of the methods on a RegExp object there will be a short pause while the expression is compiled. Once the regular expression has been compiled, though, it can execute right away.
The pause for compilation is usually on the order of a few tens of milliseconds. That’s small enough that you might never notice it even when stepping through code. But it can really add up when the expression is going to be applied to a few hundred thousand rows.
If you’re using a RegExp object in a procedure while looping through a recordset programmatically you can just keep using the same RegExp object. But what about when the same pattern is used in independent calls to the same function — or even to different functions entirely? The solution is to cache all of your regular expressions objects, and serve them up using a simple factory pattern. In this article I’ll show how to do just that.
Creating the Factory
We will store all the RegExp objects in a dictionary, where the keys will be a concatenation of the pattern and some symbols to represent the three options properties (Global, IngoreCase, and Multiline).
You will need the following references for this to code work:
- Microsoft VBScript Regular Expressions 5.5
- Microsoft Scripting Runtime
' Module-level declaration. The object will persist between calls Private pCachedRegexes As Dictionary Public Function GetRegex( _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True, _ Optional ByVal MatchGlobal As Boolean = True) As RegExp ' Ensure the dictionary has been initialized If pCachedRegexes Is Nothing Then Set pCachedRegexes = New Dictionary ' Build the unique key for the regex: a combination ' of the boolean properties and the pattern itself Dim rxKey As String rxKey = IIf(IgnoreCase, "1", "0") & _ IIf(MultiLine, "1", "0") & _ IIf(MatchGlobal, "1", "0") & _ Pattern ' If the RegExp object doesn't already exist, create it If Not pCachedRegexes.Exists(rxKey) Then Dim oRegExp As New RegExp With oRegExp .Pattern = Pattern .IgnoreCase = IgnoreCase .MultiLine = MultiLine .Global = MatchGlobal End With Set pCachedRegexes(rxKey) = oRegExp End If ' Fetch and return the pre-compiled RegExp object Set GetRegex = pCachedRegexes(rxKey) End Function
Now anywhere in your code that you need a RegExp object, you call GetRegex instead of declaring a New RegExp
. This also makes it really fast to create new RegExp objects in a single line.
Updating the Regular Expressions Wrappers
See Regular Expressions in VBA for an in-depth discussion of these methods. Here we’ll just update them to take advantage of the new RegExp factory we just built:
RxTest
Public Function RxTest( _ ByVal SourceString As String, _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True) As Boolean ' Wow, that was easy: RxTest = GetRegex(Pattern, IgnoreCase, MultiLine, False).Test(SourceString) End Function
RxMatch
Public Function RxMatch( _ ByVal SourceString As String, _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True) As Variant Dim oMatches As MatchCollection With GetRegex(Pattern, IgnoreCase, MultiLine, False) Set oMatches = .Execute(SourceString) If oMatches.Count > 0 Then RxMatch = oMatches(0).Value Else RxMatch = Null End If End With End Function
RxMatches
Public Function RxMatches( _ ByVal SourceString As String, _ ByVal Pattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True, _ Optional ByVal MatchGlobal As Boolean = True) As Variant Dim oMatch As Match Dim arrMatches Dim lngCount As Long arrMatches = Array() With GetRegex(Pattern, IgnoreCase, MultiLine, MatchGlobal) For Each oMatch In .Execute(SourceString) ReDim Preserve arrMatches(lngCount) arrMatches(lngCount) = oMatch.Value lngCount = lngCount + 1 Next End With RxMatches = arrMatches End Function
RxReplace
Public Function RxReplace( _ ByVal SourceString As String, _ ByVal Pattern As String, _ ByVal ReplacePattern As String, _ Optional ByVal IgnoreCase As Boolean = True, _ Optional ByVal MultiLine As Boolean = True, _ Optional ByVal MatchGlobal As Boolean = True) As String ' A single statement! RxReplace = GetRegex( _ Pattern, IgnoreCase, MultiLine, MatchGlobal).Replace( _ SourceString, ReplacePattern) End Function
And that’s that! In a future post I’ll show examples of when and how you might actually use such functions, as well as performance tests delving deeper into the benefits of caching your RegExp objects.
Conclusion
Behind the scenes, RegExp objects generate tiny compiled programs that are executed in order to apply regular expressions. Using a simple caching mechanism you can dramatically increase the performance of using regular expressions in Microsoft Access by saving each unique compiled RegExp object, as identified by the pattern and the three options properties of the object.