Regex To Allow Not Only Whitespace

A RegexLand reader recently posted this question. Which regular expression can be used to match a string that is not only whitespace?

A regex to match a string that does not contain only whitespace characters can be written in two ways. The first involves using wildcards (.) and the non-whitespace character set (\S), while the other one involves a positive lookahead to check that the string contains at least one non-whitespace character (\S).

To my mind, the simplest regular expression that can do this (without using look-aheads) is:

/.*\S.*/Edit with Regexity

Let’s dive a little deeper into this expression and see what it does.

Method 1: Using Non-Whitespace Character Set

In regex, whitespace characters include spaces, tabs, carriage returns, newlines, etc. The \sEdit with Regexity character set will match any of these characters. However, the inverse character set \SEdit with Regexity – note the capital letter SEdit with Regexity – will match any character that is not a whitespace character, i.e. one that is not on the list above. This makes it a great starting point for our expression:

/\S/Edit with Regexity

However, all by itself, this expression will match only a single non-whitespace character and nothing else. But it’s a good starting point since the expression will only match if a non-whitespace character is found.

Suppose a non-whitespace character is found, we then need to select the content before and after it. To do this, we can add a dot wildcard character .Edit with Regexity (which will match any character) both before and after the non-whitespace character. We should also add asterisk quantifiers *Edit with Regexity to these, specifying that the regex engine can match zero-or-more occurrences of this character.

/.*\S.*/Edit with Regexity

The expression will therefore match the entire string if a non-whitespace character is found somewhere in the string. If no non-whitespace character is encountered in the string (i.e. if the string only contains whitespace characters, it will fail).

This method works great if you’re using a flavor of regex that does not support lookarounds. Lookarounds are used in the second method.

Method 2: Using a Positive Look-Ahead

The second method is somewhat more involved and involves using a look-ahead to spot if the line contains only whitespace characters.

/(?!^\s+$)^.*$/mEdit with Regexity

How does it work?

We’ll use a negative look-ahead in this case. A negative look-ahead will “look ahead” in search for an expression, and only allow matching to continue if the expression is not present in the string. It takes the form (?!)Edit with Regexity, where Edit with Regexity is replaced with the expression that must be “looked-ahead” for.

So we start by checking if the string contains a whitespace character, using the whitespace character set \sEdit with Regexity as shown:

/(?!\s)/Edit with Regexity

However, strings that contain whitespace are acceptable – only ones that contain only whitespace must be rejected. So let’s add a start-of-string ^Edit with Regexity and end-of-string $Edit with Regexity anchor before and after the whitespace character set.

/(?!^\s$)/Edit with Regexity

This will work, but only for a single whitespace character. To prevent matching for multiple consecutive whitespace characters as well, we need to include a one-or-more quantifier +Edit with Regexity behind the whitespace character set.

/(?!^\s+$)/Edit with Regexity

So far so good. We can now identify strings containing only whitespace characters and halt the matching.

However, this won’t select anything as the look-ahead only “checks” without selecting. To do that we can add an expression after the look-ahead to specify what must be selected.

In this case, we’d like to select the whole string using the wildcard character .Edit with Regexity, coupled with a zero-or-more quantifier *Edit with Regexity to indicate that we’d like to select zero or more characters.

/(?!^\s+$).*/Edit with Regexity

The expression above will also match empty strings since we’re using a zero-or-more quantifier. If you want to prevent matching empty strings, use the one-or-more quantifier +Edit with Regexity instead.

To ensure that we select the entire string from front to back we can add a start-of-string anchor ^Edit with Regexity and an end-of-string anchor $Edit with Regexity before and after the wildcard character, respectively.

/(?!^\s+$)^.*$/Edit with Regexity

Finally, this expression only works if the multiline flag mEdit with Regexity is turned on.

/(?!^\s+$)^.*$/mEdit with Regexity

So, in conclusion, the expression above uses a negative lookahead to ensure that the string does not contain only whitespace characters and then selects the entire string. If it finds only whitespace characters, nothing is selected.

Sources

Some regular expressions on this page were adapted from a solution presented on Stack Overflow, posted by Mike Cheel on this question. Many thanks to these sources for their time and effort.

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

Leave a Comment