Regex for Spaces

Isn’t it nasty when a piece of text contains multiple consecutive spaces where they shouldn’t be? Or extra spaces at the start or end of a piece of string? Regex can be used to tidy up these in no time.

A regular expression for matching spaces should look for the space character / / or its ASCII representation /\x20/ in a string. Quantifiers like / {2,}/ can be used to match a specific number of spaces. In addition, the space character set /\s/ can be used to match any whitespace character including tabs, newlines, etc.

Let’s discuss several of these expressions and see how they’re used.

Matching a Single Space Character

The most logical way of matching a space in regex is to simply use the space character in your expression:

/ /gEdit with Regexity

However, using this character can sometimes lead to confusing expressions since it can be easy to miss the space among other characters. For this reason, it might be easier to specify it as an escaped character. The octal code for a space can be used like this:

/\x20/gEdit with Regexity

The hexadecimal character escape can also be used for this purpose (not that this won’t work if the uEdit with Regexity flag is enabled):

/\040/gEdit with Regexity

The Unicode character for space can also be used for this purpose, although this will require enabling the Unicode flag uEdit with Regexity. This is used like this:

/\u0020/guEdit with Regexity

Matching Multiple Space Characters

Any of the expressions above can be used along with a quantifier to specify the number of spaces to be matched. For example, to match one or more spaces, use the one-or-more +Edit with Regexity quantifier:

/\x20+/gEdit with Regexity

This can be very handy in matching a set of consecutive spaces and replacing them with a single space (i.e. cleaning up a body of text).

To match a specific number of spaces (3 in this case), use an exact quantifier:

/\x20{3}/gEdit with Regexity

To match a specific number or more (for example 5 or more), use the unbounded exact quantifier:

/\x20{5,}/gEdit with Regexity

Matching Spaces at the Start or End of a String

Anchors can be used to match spaces in specific locations. The start-of-string anchor ^Edit with Regexity is used to match spaces at the start of a string, while the end-of-string anchor $Edit with Regexity is used to match spaces at the end of a string:

/^\x20+/Edit with Regexity
/\x20+$/Edit with Regexity

The expressions above use the one-or-more quantifier +Edit with Regexity to match one or more spaces in this location.

Matching HTML Spaces Using Regex

Although the normal space character works in HTML, other space entities are sometimes used to force a space in a certain location. For example, the non-breaking space character   or its counterpart   can be matched as:

/ /gEdit with Regexity
/ /gEdit with Regexity

The   and   entities insert two or four spaces, respectively, and can be matched as follows:

/ /gEdit with Regexity
/ /gEdit with Regexity

Take note that these expressions will only match the HTML entities in a block of HTML code, but not in the resulting rendered HTML.

Matching Spaces in URLs

Spaces in URLs are usually formatted as %20, and can therefore be matched as follows:

/%20/gEdit with Regexity

Matching Spaces as Part of Whitespace Characters

Regex contains various character sets that will match various whitespace characters that are not necessarily spaces per se.

The whitespace character set \sEdit with Regexity is used to match any whitespace character which includes spaces, line breaks, newlines, tabs, etc.

/\s/Edit with Regexityg

Sources

The regular expressions on this page were adapted from a solution presented on Stack Overflow, posted by various users on this question. HTML entities for spaces were summarized on this page and this page.

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

Leave a Comment