Regex Character Classes – A Complete Guide

What is a regex character class?

In regular expressions, a character class allows one to define a custom set of characters that are allowed in a match. It is written by enclosing the allowed characters with square brackets e.g. [abc]. A negated character class denoted by [^…] can be used to match any character except those specified between the square brackets.

Let’s dive a little deeper and try to understand how this all works

How a Character Class Works

A character class allows one to specify a specific set (or class) of characters that are allowed at a particular location within a match. The set of characters is enclosed by an opening [Edit with Regexity and closing ]Edit with Regexity square bracket.

/[abcde]/Edit with Regexity

The expression above will match a single occurrence of the letters aEdit with Regexity to eEdit with Regexity. It basically works like a large OR expression, i.e. match either aEdit with Regexity, bEdit with Regexity, cEdit with Regexity, dEdit with Regexity, or eEdit with Regexity at this specific location.

The character class can be combined with other regex characters, like this:

/d[oi]g/Edit with Regexity

The expression above will match either dogEdit with Regexity or digEdit with Regexity.

Character Ranges

Instead of specifying a whole string of characters like [abcde]Edit with Regexity we can use shorter notation and specify a character range using a dash:

/[a-e]/Edit with Regexity

This will match any letter from aEdit with Regexity to eEdit with Regexity.

It is important that the Unicode character code of the second character be greater than that of the first, else you will receive an error message. For example [a-z]Edit with Regexity is acceptable but [z-a]Edit with Regexity is not.

As long as the second Unicode character code is greater than the first, you can also specify the range across different kinds of characters, such as [A-z]Edit with Regexity which will match all uppercase and lowercase alphabetic letters. Be careful with these, however, as this range also includes 5 additional characters from U+005B to U+0060 as seen here.

A better alternative is to specify more than one character range, like this:

/[a-zA-Z0-9]/Edit with Regexity

This will match any uppercase or lowercase alphabetic character as well as digits 0 to 9.

Subsets of these can also be specified, such as the following which will match any lowercase letter between eEdit with Regexity and kEdit with Regexity:

/[e-k]/Edit with Regexity

Character ranges can also be used in conjunction with other single characters in a character class, like this:

/[agx-z]/Edit with Regexity

Note that the character range acts only on the characters directly before and after the dash symbol Edit with Regexity. The expression above will match either aEdit with Regexity, gEdit with Regexity, or any character from xEdit with Regexity to zEdit with Regexity.

Special Characters in Character Classes

For the most part, all characters used inside a character class are treated literally. That is, they don’t have to be escaped. For example, the dot character .Edit with Regexity works as a wildcard character outside the character class (matching any character), but is treated as a literal period inside a character class.

/10[.,]2/Edit with Regexity

The expression above will match the decimal number 10.2Edit with Regexity or its alternate form 10,2Edit with Regexity.

Other characters such as parentheses ( )Edit with Regexity and curly braces { }Edit with Regexity are also matched literally when used inside a character class.

The only characters that do need escaping are the closing square bracket (e.g. \]Edit with Regexity) and the backslash character \\Edit with Regexity. These are escaped by preceding them with a backslash character \Edit with Regexity, as follows.

/[\]]/Edit with Regexity
/[\\]/Edit with Regexity

The dash character Edit with Regexity is used to indicate a character range and therefore needs to be escaped in some cases. If you want to force it to match literally, you should escape it only when occurring between two other characters inside the character class. For example, the dash character will be treated literally in the following cases:

/[-]/Edit with Regexity
/[a-]/Edit with Regexity
/[-z]/Edit with Regexity 

However, in the following case, it will create a character range from aEdit with Regexity to zEdit with Regexity.

/[a-z]/Edit with Regexity

If you want to match a-z literally you need to escape it using a backslash character.

/[a\-z]/Edit with Regexity

Negating Character Classes

If the list of allowable characters grows very large, it is sometimes simpler to specify a set of characters that are not allowed instead. Luckily, there’s an easy way to this by simply including a carret symbol ^Edit with Regexity behind the opening square bracket of the character class, like this:

/[^abc]/Edit with Regexity

The expression above will match any character that is not aEdit with Regexity, bEdit with Regexity, or cEdit with Regexity.

Pro tip: The carret symbol ^Edit with Regexity that is used to negate the character class is not to be confused with the start-of-string anchor, which uses the same symbol but is not used inside a character class.

Using Quantifiers with Character Classes

Up until now the character classes we wrote were only able to allow one character from the list of characters. However, we can use any of the available quantifiers in regex to specify exactly how much of these characters we will allow.

For example, to allow two consecutive characters from a character class, we can include the specific quantifier {2}Edit with Regexity like this:

/b[oa]{2}t/Edit with Regexity

The expression above will match either boatEdit with Regexity, bootEdit with Regexity, baotEdit with Regexity, and baatEdit with Regexity.

We can also use a zero-or-more *Edit with Regexity, one-or-more +Edit with Regexity, or a zero-or-one ?Edit with Regexity quantifier. For example, the following expression will match zero or more lowercase alphabetic characters:

/[a-z]*/Edit with Regexity

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

Leave a Comment