Ultimate Regex Tutorial and Cheatsheet

So you’ve heard that regex is a quick and easy way to help you evaluate user input or scrape information from a body of text. This article presents a complete overview of the topic and the syntax to get you started.

So what is regex anyway?

Regular expressions (or simply regex) are small pieces of code that define a pattern of text. They are used to check if a certain string matches this pattern, or alternatively to find all matches of a certain pattern in a body of text. Regex is written using a specific syntax that is fairly constant across various programming languages.

It should be noted that the expressions in this article are Javascript-related. Other programming languages are 90% similar with only minor differences or additions.

In this article, we’ll cover:

  1. Basic format of a regular expression
  2. Literal matches
  3. Alternation
  4. Character classes
  5. Character sets and wildcards
  6. Quantifiers
  7. Anchors
  8. Special Characters
  9. Capture groups
  10. Look-aheads
  11. Look-behinds
  12. Flags
  13. Substitution
  14. Get Practicing
  15. References

So without further ado, let’s get started! If you’re the video-watching kind, here’s a primer to get into the groove of things:

1. Basic Format of a Regular Expression

All regular expressions start and end with a forward slash /Edit with Regexity. The forward slashes at the beginning and end show the start and end of the expression.

/.../Edit with Regexity

Sometimes, the regular expression can be followed by an optional flag modifier (we’ll discuss these later). Flag modifiers alternate the overall behavior of the regular expression. Each flag consist of a single letter – for example, the global modifier is denoted by the letter gEdit with Regexity.

Flags are added after the final forward slash:

/.../gEdit with Regexity

If you want to enable more than one flag, simply list them one after the other after the final forward slash. For example, to enable the global gEdit with Regexity, multiline mEdit with Regexity, and case insensitive iEdit with Regexity flags, use the following expression:

/.../gmiEdit with Regexity

2. Literal Matches

The most basic regular expression lets you match a literal string. This literally matches the characters indicated. For example, to match the word codeEdit with Regexity, use the following regex:

/code/Edit with Regexity

3. Alternation

To match two alternate options, for example to match either codeEdit with Regexity or programEdit with Regexity, use the pipe character |Edit with Regexity. This character represents “OR” in regex, as it does in various other programming languages.

/code|program/Edit with Regexity

4. Character Classes

A character class allows us to match several possible characters. The character class is enclosed in square brackets []Edit with Regexity. For example, to match any single character from the group aEdit with Regexity, bEdit with Regexity, or cEdit with Regexity, use the following regex:

[abc]Edit with Regexity

To match any single character except those from the group aEdit with Regexity, bEdit with Regexity, or cEdit with Regexity, use the same expression with a caret symbol ^Edit with Regexity at the start of the range:

[^abc]Edit with Regexity

Character classes are great in that you can specify a range of characters in one go using the dash symbol Edit with Regexity.

[a-z]Edit with Regexity

The expression above will match any lowercase letter from aEdit with Regexity to zEdit with Regexity. But you can also string these together, one after the other, inside the square brackets.

[a-zA-Z0-9]Edit with Regexity

The expression above will match any lowercase letter from aEdit with Regexity to zEdit with Regexity, any uppercase letter from AEdit with Regexity to ZEdit with Regexity, and also any digit from 0Edit with Regexity to 9Edit with Regexity.

You don’t need to stick to [a-z]Edit with Regexity. You can also use subsets of these. For example, the following expression will match any lowercase letter from bEdit with Regexity to fEdit with Regexity, any digit from 4Edit with Regexity to 6Edit with Regexity, and also the digit 8.

[b-f4-68]Edit with Regexity

Note: Don’t confuse the expression above as matching 4Edit with Regexity to 68Edit with Regexity. We are working in terms of single characters here. The 6 and the 8 operate independently from one another.

5. Character Sets and Wildcards

Character sets and wildcards allow us to really kick things up a notch. They act the same way as combining a whole string of character classes using shorthand notation.

For example, the dot character matches any character out there, except a newline.

/./Edit with Regexity

Pro tip: Turning on the single line flag sEdit with Regexity allows the dot character to match newline characters as well.

Character sets consist of a backslash \Edit with Regexity followed by a single letter. For example, the \sEdit with Regexity character set matches all whitespace, including spaces, tabs, newlines, and carriage returns.

/\s/Edit with Regexity

Each character set also has a twin capital letter set that acts exactly the opposite. For example, the \SEdit with Regexity character set will match anything except a space character.

/\S/Edit with Regexity

Other character sets include:

6. Quantifiers

Quantifiers enable us to specify how many of a certain character we want to match. This can get pretty useful as we start building more complex expressions.

ALSO READ: Regex Quantifiers – A Complete Guide

Firstly, the question mark ?Edit with Regexity quantifier means zero or one., basically indicating that a certain character is optional (i.e. it will match zero or one instances of this character).

Quantifiers act on the character, range, or group directly preceding them. For example, in this next expression the optional quantifier ?Edit with Regexity acts on the aEdit with Regexity character, indicating that we can match the aEdit with Regexity character zero or one times.

/a?/Edit with Regexity

The asterisk quantifier *Edit with Regexity means zero or more. It will thus match either zero, one, two, three, up to infinity consecutive occurrences of aEdit with Regexity in the next expression:

/a*/Edit with Regexity

The plus sign quantifier +Edit with Regexity means one or more. It acts basically the same as the asterisk, except that it won’t match zero instances. It will match one, two, three, up to infinity consecutive occurrences of aEdit with Regexity in the next expression.

/a+/Edit with Regexity

To match an exact quantity of some character, range, or group, we can specify the quantity enclosed in curly brackets {3}Edit with Regexity. For example, the following expression will match exactly 3 consecutive occurrences of the letter aEdit with Regexity, no more and no less:

/a{3}/Edit with Regexity

To match three or more consecutive characters, use the following expression (you can replace the 3Edit with Regexity with any quantity you want to specify):

/a{3,}/Edit with Regexity

To match between two exact quantities of consecutive characters, say between 3Edit with Regexity and 5Edit with Regexity, use the following syntax:

/a{3,5}/Edit with Regexity

The expression above will match three, four, or five consecutive appearances of the letter aEdit with Regexity.

Quantifiers and Character Ranges

We can now start combining quantifiers with character ranges. For example, the next expression will match one or more characters from the range [a-z]Edit with Regexity:

/[a-z]+/Edit with Regexity

Lazy Quantifiers

The quantifiers mentioned above are all greedy by default, meaning that they match as many characters as possible. For example, in the text aaaaaEdit with Regexity the following expression will match the entire string (meaning the most occurrences of the letter aEdit with Regexity that it can find):

/a{2,}/Edit with Regexity

To make the quantifier lazy (i.e. to match as few characters as possible), insert a question mark behind it:

/a{2,}?/Edit with Regexity

In the sentence aaaaaEdit with Regexity, the expression above will match only two a’s aaEdit with Regexity. If the global flag gEdit with Regexity is enabled, it will actually match the first and second aEdit with Regexity to in one match, and the third and fourth aEdit with Regexity in a second match.

7. Anchors

Anchors help us assert that a match occurs at a certain location within the string. For example, the caret symbol ^Edit with Regexity at the start of a string ensures that the match starts at the very beginning of a string.

/^a/Edit with Regexity

This will match the letter aEdit with Regexity only if it occurs at the very start of the string.

Pro tip: The start-of-string caret ^Edit with Regexity is not to be confused with the negation caret at the start of character range [^abc]Edit with Regexity as discussed earlier. The former is an anchor symbol to ensure the match is at the beginning of the string, while the latter is a negation symbol that appears at the start of a character range, ensuring non of the symbols in the character range are matched.

The corresponding anchor for the end of the string is the dollar sign $Edit with Regexity. For example, the following expression will match the letter aEdit with Regexity only if occurs at the very end of the string.

/a$/Edit with Regexity

The word boundary character \bEdit with Regexity ensures the match is at a word boundary (i.e. at the start or the end of a word character). It should be noted that a “word” in regex talk may consist of characters including [a-z]Edit with Regexity, [A-Z]Edit with Regexity, [0-9]Edit with Regexity, and an underscore _Edit with Regexity.

The \BEdit with Regexity special character is the negation of \bEdit with Regexity and matches an expression only if it is not at a word boundary.

8. Special Characters

There are a couple of special characters expressions in regex that match specific special characters:

\nEdit with RegexityNewline character
\rEdit with RegexityCarriage return character
\tEdit with RegexityTab character
\0Edit with RegexityNewline character

Regex also allows for the use of Unicode characters. You can specify this using the pattern \xAAEdit with Regexity where AAEdit with Regexity is a two-character hexadecimal representation of the character. For instance, the hexadecimal Unicode for a space is 20Edit with Regexity and thus the following expression will match a space:

/\x20/Edit with Regexity 

Similar to this two-character representation, there is also a four-character representation in the form \uAAAAEdit with Regexity where AAAAEdit with Regexity is the four-character hexadecimal Unicode representation. The following expression will therefore also match a space:

/\u0020/Edit with Regexity 

9. Capture Groups

So now that we’ve covered most of the basic building blocks of regex, we can start combining expressions using groups.

We can group certain characters together by placing a set of parentheses ()Edit with Regexity around them.

/(step[0-9])/Edit with Regexity

This allows us to use quantifiers on the entire group. For example, the next expression will match 2 consecutive occurrences of step[0-9]Edit with Regexity where [0-9]Edit with Regexity can be any digit.

/(step[0-9]){2}/Edit with Regexity

We can also use the pipe character |Edit with Regexity (which signifies “OR” in regex speak) inside a group without affecting the rest of the expression outside the group.

/p(a|o)int/Edit with Regexity

The expression above will match either paint or point. As you can see, the alternation is limited to the aEdit with Regexity and the oEdit with Regexity and does not affect the text before or after the group.

A group like the ones above is called a capture group, indicating that it captures the contents of the parenthesis and returns it with the full match of the expression. In this case, the result of a regular expression is usually in the form of an array with the entire match as the 0th entry, the first capture group as the 1st entry, the second capture group as the 2nd entry, and so on.

For example, let’s say you’d like to extract the digits before and after a decimal point separately. We can do this by using the following expression:

/(\d)+-(\d)+/Edit with Regexity

This will match one or more digits, followed by a period .Edit with Regexity, followed by one or more digits, and will return the digits before and after the period as capture groups. For example, if we used 19.45Edit with Regexity as the input string, the expression above would return the following array:

  1. 19.45 – entire expression
  2. 19 – capture group 1
  3. 45 – capture group 2

If you want to group items in the expression, but don’t want to return them along with the entire match, you can change the group to a non-capture group by adding a ?:Edit with Regexity after the opening parentheses:

/(?:...)/Edit with Regexity

You can also name a capture group, by adding ?<name>Edit with Regexity to the front of the expression (substitute nameEdit with Regexity with the name of the group you’d like to use, e.g. “wholeNumbers” or “decimalNumbers”):

/(?<name>...)/Edit with Regexity

To match the text being captured by a group within the same expression again, you can type \k<name>Edit with Regexity. For instance, the following expression will match only if the numbers before the dash are repeated after the dash (e.g. 12-12Edit with Regexity:

/(?<digits>\d+)-\k<digits>/Edit with Regexity

10. Look-aheads

Look-aheads can be used to ensure that a match is/isn’t followed by some pattern, without actually including this pattern in a match.

The syntax for a positive look-ahead looks much like that of a group and is written as (?=)Edit with Regexity:

/\w+(?=.com)/Edit with Regexity

The expression above will match a typical domain name only if followed by .comEdit with Regexity. However, it will “look-ahead” to see if the .comEdit with Regexity is there without including it in the match. Thus if the input string to the expression above is regexland.comEdit with Regexity, it will check that .comEdit with Regexity is present after the domain, but will only return regexlandEdit with Regexity without the .comEdit with Regexity as the final match.

The expression above is a positive look-ahead since it asserts that the match is followed by a specified pattern. However, we can also use a negative look-ahead to assert that the match is not followed by a specified pattern. These are written in the form (?!)Edit with Regexity:

/\w+(?!.com)/Edit with Regexity

The expression above will match the domain name only if it is not followed by .comEdit with Regexity.

11. Look-behinds

Similar to look-aheads, we also have look-behinds to test if a match is preceded by some specified pattern. Whereas a look-ahead is written after the match, a look-behind is usually written before the match.

A positive look-behind asserts that the match is preceded by a specified pattern. It is written in the form (?<=…)Edit with Regexity (which looks exactly like the positive look-ahead but with a <Edit with Regexity after the question mark):

/(?<=Mr. )\w+/Edit with Regexity

The expression above will match a person’s name only if preceded by Mr. Edit with Regexity.

There is also the negative look-behind which asserts the match is not preceded by a specified pattern. It is written in the form (?<!…)Edit with Regexity:

(?<!Mr. )\w+Edit with Regexity

The expression above will match any range of word characters that are not preceded by Mr. Edit with Regexity.

12. Flags

Flags alter the entire behavior of an expression. They are listed after the closing forward slash of the expression, one after the other. Here’s a quick rundown of the most common ones:

For instance, the global modifier gEdit with Regexity ensures that all instances of a specific pattern are returned. If it is not present, only the first match will be returned.

/[a-z]/gEdit with Regexity

The multiline flag mEdit with Regexity ensures that start-of-string ^Edit with Regexity and end-of-string $Edit with Regexity anchors match the start and end of each line, instead of the entire multiline body of text:

/[a-z]/mEdit with Regexity

The case insensitive flag iEdit with Regexity matches text regardless of their case. For instance, it will match both upper- and lowercase letters if only lowercase letters are provided in the expression.

/[a-z]/iEdit with Regexity

The great thing is that we can string these flags one after the other if we need to use more than one of them:

/[a-z]/gmiEdit with Regexity

13. Substitution

Once we’ve extracted a match using regex, we can use the returned match as well as groups to construct a new string (also called a substitution).

var dashDate = "12-12-2021"
var slashDate = str.replace(/-/g,"/")

The Javascript above will search for all the dashes Edit with Regexity in the string dashDateEdit with Regexity and replace them with forward slashes /Edit with Regexity, thus turning the date 12-12-2021Edit with Regexity into 12/12/2021Edit with Regexity.

To insert the first capture group’s match into the substituted string, we can use $1Edit with Regexity. The second capture group is matched by $2Edit with Regexity, the third by $3Edit with Regexity, and so forth.

For example, you could log and print out the day, month, and year separately using the following expression.

var myDate = "12-01-2021"
var dateRegex = /(\d{2})-(\d{2})-(\d{4})/Edit with Regexity
var dateParts = myDate.match(dateRegex)
console.log(dateParts[0]) // the whole date (e.g. 12-01-2021)
console.log(dateParts[1]) // day (e.g. 12)
console.log(dateParts[2]) // month (e.g. 01)
console.log(dateParts[3]) // year (e.g. 2021)

Using $`Edit with Regexity in the substitution will return everything before the match and $Edit with Regexity will return everything after the match.

The expression $&Edit with Regexity will return the entire contents of the match.

Get Practicing

Now that you know your way around the regex syntax, the best way to move forward is to start practicing. Have a look at our list of the 20 most common regular expressions to get you started.

To my mind, the best way to practice is with a regex tool such as Regex101 or Regexr. These tools help you really visualize what you are doing and make it so much easier to focus on the task at hand.

References

Thanks for the extensive list of regex syntax as shown on Regex101 and Regexr, which proved a great help in compiling this tutorial.

W3Schools and MDN Web Docs have also been of great help, especially getting all the Javascript right.

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

1 thought on “Ultimate Regex Tutorial and Cheatsheet”

  1. Hi! I’m very glad to find this website, very instructive indeed!
    My doubt about regex is:
    How can I match multiples of the same character, but not in a sequence but necessarily multiples, e.g. [tt], will match: tutor, and also tutti. Let’s say that I’ve got a dictionary and I would like to match every 5 letter words who has 2 ‘e’ in whichever position, so:
    ‘terse’, ‘sweet’, ‘sewer’ are all fair game.
    I know how to ‘hard code’ those rules using strings but I was not able to do it with regular expressions.
    I’ve discovered that certain rules can not be easily be put on regex by trying to solve board games like ‘scrabble’ using regex, and in those games, the rules can get pretty complicated pretty quickly.
    Thank you for your consideration.

    Reply

Leave a Comment