Regex Match All Characters Between Two Specified Characters

Regex can be used to select everything between the specified characters. This can be useful for things like extracting contents of parentheses like (abc) or for extracting folder names from a file path (e.g. C:/documents/work/).

A regular expression that matches all characters between two specified characters makes use of look-ahead (?=)Edit with Regexity and look-behind (?<=…)Edit with Regexity statements to isolate the string, and then uses the dot character .Edit with Regexity to select all contents between the delimiters.

An expression that does matches everything between aEdit with Regexity and bEdit with Regexity is:

/(?<=a).*(?=b)/gEdit with Regexity

Let’s discuss how it works:

How it Works

The expression starts with a positive look-behind (?<=…)Edit with Regexity which ensures that the matched string is preceded to whatever is in the place of Edit with Regexity. In this case, we want to ensure that the letter aEdit with Regexity directly precedes the matched string.

/(?<=a)/Edit with Regexity

Look-aheads and look-behinds are assertive, which means that they are only used to check if a certain condition is true. Their contents (aEdit with Regexity in this case) are not matched.

After the presence of the aEdit with Regexity character, we want to match any character. This is denoted by the dot symbol .Edit with Regexity which will match any character except a newline character. On its own, the dot symbol will only match a single character, so we need to include a zero-or-more quantifier *Edit with Regexity behind it to ensure that we match zero or more of any character.

/(?<=a).*/Edit with Regexity

We want to stop matching when we encounter a bEdit with Regexity character. This is specified by a positive look-ahead (?=)Edit with Regexity. This will ensure that the matched string is directly followed by whatever is in the place of Edit with Regexity.

In this case, we use the character bEdit with Regexity inside the positive look-ahead:

/(?<=a).*(?=b)/Edit with Regexity

Finally, to return every instance of this match and not just the first, we include the global modifier gEdit with Regexity at the very end of the expression:

/(?<=a).*(?=b)/gEdit with Regexity

Match All Characters Greedy vs. Lazy

The following expression will match as many characters between aEdit with Regexity and bEdit with Regexity as it can. This is because the zero-or-more quantifier *Edit with Regexity is greedy.

/(?<=a).*(?=b)/gEdit with Regexity

This will produce the following matches:

another baby bathtub

Notice how it skips over three bEdit with Regexity characters and only stops the match right at the last bEdit with Regexity.

However, if we add a lazy identifier ?Edit with Regexity behind the zero-or-more quantifier, it makes the quantifier lazy, causing it to match as few characters as possible.

/(?<=a).*?(?=b)/gEdit with Regexity

This will produce the following matches:

another baby bathtub

Regex Match All Including Newline Characters

The expression above will match all characters between the two specified characters, except the newline character. To include the newline character in the match, we have several options.

This can be done by including the dotall modifier sEdit with Regexity (also called the single-line modifier) at the end, which treats the entire input text as a single line and therefore also matches newline characters.

/(?<=a).*(?=b)/gsEdit with Regexity

Some flavours of regex allow turning on the dotall modifier inside the expression using (?s)Edit with Regexity:

/(?s)(?<=a).*(?=b)/gEdit with Regexity

If the dotall modifier is not available in your flavour of regex, you can substitute the dot symbol .Edit with Regexity for [\s\S]Edit with Regexity enclosed in square brackets. This matches all whitespace characters \sEdit with Regexity (which include spaces, tabs, newlines, etc.) and all non-whitespace characters \SEdit with Regexity (which include letters, numbers, punctuation, etc.).

/(?<=a)[\s\S]*(?=b)/gEdit with Regexity

The square brackets indicate that we can match any of the characters in any order, and the zero-or-more quantifier *Edit with Regexity works just as before.

Match All Between Two Characters Without Lookarounds

Some flavours of regex do not support look-aheads and look-behinds at all. In these cases, we can use the following expression.

/a(.*)b/gEdit with Regexity

Here we used the dot symbol .Edit with Regexity together with the zero-or-more modifier *Edit with Regexity to match zero-or-more of any character. These are enclosed in parentheses ()Edit with Regexity to capture the contents for return it for later use.

Finally, this entire expression is sandwiched between the two characters we want to have matched, aEdit with Regexity and bEdit with Regexity in this case.

Note that this will expression will return the aEdit with Regexity and bEdit with Regexity together with the contents between them. However, the contents without aEdit with Regexity and bEdit with Regexity will be contained in the first capture group returned.

All the above modifications above be used on this expression. For example, newline characters can be included with:

/a([\s\S]*)b/gEdit with Regexity

Or the zero-or-more quantifier can be made lazy using the lazy indicator ?Edit with Regexity:

/a(.*?)b/gEdit with Regexity

Which Flags to Use

To extract all matches from the piece of text, and not just the first match, be sure to include the global modifier gEdit with Regexity at the end of the expression:

/(?<=a).*(?=b)/gEdit with Regexity

Since we are working with text here, you can also include the case insensitive modifier iEdit with Regexity to include matches regardless of their case.

Sources

The regular expressions on this page were adapted from solutions presented on Stack Overflow by Gopi posted on this question, by stema posted on this question, and by cletus posted on this question.

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

2 thoughts on “Regex Match All Characters Between Two Specified Characters”

Leave a Comment