Regex Match Everything After a Specific Character

Sometimes we’d like to extract everything from a body of text that follows a specific character. For instance, you’d like to extract the query string from a URL, which follows a question mark.

A regular expression that matches everything after a specific character can be written in more than one way. Some methods search for whitespace and non-whitespace characters following the character, while other methods make use of positive look-behinds.

Let’s discuss the various methods below.

Method 1: Match everything after first occurence

The following regular expression will return everything following the first occurrence of the character “a”.

/a([\s\S]*)$/Edit with Regexity

We start the expression by listing the character we want to extract after (in this case the letter a):

/a/Edit with Regexity

After this character, we specify that we’d like to match all whitespace characters (\s) and all non-whitespace characters (\S). Whitespace characters include spaces, tabs, linebreaks, etc. while non-whitespace characters include all letters, numbers, and punctuation. So essentially, the \s\S combination matches everything. It’s similar to the dot character (.) which matches everything except new-lines.

The \s and \S are included in square brackets which say that we’d like to match any of them in any order:

/a[\s\S]/Edit with Regexity

To specify how many characters we’d like to match after the character, we need to use a quantifier. The zero-or-more character (*) will match a string of any length.

/a[\s\S]*/Edit with Regexity

However, you could just as easily have used the one-or-more quantifier (+) or specific length quantifier {5} in its place.

To ensure we match everything up to the end of the body of text, use the end-of-string character ($) at the end:

/a[\s\S]*$/Edit with Regexity

However, this expression will include the letter “a” in the match. To extract everything after the letter a, we need to introduce a capture group using parentheses:

/a([\s\S]*)$/Edit with Regexity

The contents of the parentheses is now capture group 1, and can thus be extracted from the regex return array.

So in summary, the regular expression above matches zero-or-more whitespace and non-whitespace characters after the letter “a” up to the end of the string, and returns it as capture group 1.

ALSO READ: Regex Match Everything Except a Specific Word, Character, or Pattern

Method 2: Match everything after last occurence

The following regular expression will return everything after the last occurrence of the letter “a”:

/[^a]*$/Edit with Regexity

The expression starts with the letter a enclosed in square brackets, and with a caret symbol (^) as the first character inside the square brackets. This indicates a negated set which is used to indicate what not to match. Basically this will match everything except the letter a:

/[^a]/Edit with Regexity

By itself, the expression above will match only the first character that is not the letter a. We need to use the zero-or-more quantifier (*) after the square brackets to indicate that we’d like to match an expression of any length:

/[^a]*/Edit with Regexity

The expression above will return all matches between occurrences of the letter a in the body of text. To force it to return only the part after the last occurrence of the letter a, we need to add an end-of-string character ($) at the end:

/[^a]*$/Edit with Regexity

Method 3: Using positive look-behind

The final method makes use of a positive look-behind, which might not be supported in all regex engines:

/(?<=a)[\s\S]*/Edit with Regexity

The part in parentheses is known as a positive look-behind, which is written in the form of (?<=expression). Basically, the part after the parentheses will only be matched if the expression inside the positive look-behind (in this case only the letter a) is found.

/(?<=a)/Edit with Regexity

After the positive look-behind is satisfied, we want to match all whitespace characters (\s) and non-whitespace characters (\S). We enclose these in square brackets to indicate that both types of characters can be matched in any order:

(?<=a)[\s\S]/Edit with Regexity

The expression above will only match the first character following the contents of the positive look-behind. To ensure that we match all the characters after it, we need to include the zero-or-more quantifier (*) behind the square brackets:

/(?<=a)[\s\S]*/Edit with Regexity

And with this, our expression is complete.

Which Method Should I Use?

If the regex engine in your language of choice does not support look-aheads or look-behinds, you should steer clear of Method 3 and rather use Method 1 or 2.

If you want to match everything after the first occurrence of a character, use Method 1.

For everything after the last occurrence of a character, use Method 2. Method 3 will serve the same purpose, but only if look-behinds are supported.

Sources

The regular expressions on this page was adapted from solutions posted on Stack Overflow by PleaseStand on this question and by Mark Byers on this question.

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

5 thoughts on “Regex Match Everything After a Specific Character”

    • Hi Daniel. Thanks for your comment. After a quick search on Google I found this page which should help you get regex running in Python. Note that the regex on this site is mostly Javascript flavour, and Python’s syntax might differ somewhat.

      Reply
  1. Useful article. With some rudimentary knowledge of programming, I can now apply this concept anywhere that supports regular expressions!

    Reply

Leave a Comment