Regex for Base64 Encoded Strings

Base64 encoded strings are used widely on the internet as a way to transfer data across mediums that are not well suited for binary data. How does one write a regular expression to validate a string like this?

A regular expression that validates base64 encoded data needs to check for the characters A to Z, a to z, 0 to 9, plus (+), and forward-slash (/) combined in a multiple of 4. If the number of characters is not an exact multiple of 4, the expression must search for the equal sign (=) as padding at the end.

An expression that does this is:

/^(?:[A-Za-z\d+/]{4})*(?:[A-Za-z\d+/]{3}=|[A-Za-z\d+/]{2}==)?$/Edit with Regexity

So let’s find out how it all works.

Typical Format of a Base64 Encoded String

Base64 is a method of encoding binary data into ASCII (printable) characters. If you want to know exactly how the method works, this tutorial is a great start.

Playing around with a Base64 encoder will reveal the following patterns:

  • A base64 string consists of the characters A-ZEdit with Regexity, a-zEdit with Regexity, 0-9Edit with Regexity, and also the plus +Edit with Regexity and forward slash /Edit with Regexity symbols, arranged in multiples of 4.
  • If the number of characters above is not a multiple of 4, the following rules apply:
    • If there is 1 character less than a multiple of 4, a single equals sign =Edit with Regexity is added at the end for padding.
    • If there are 2 characters less than a multiple of 4, two equals signs ==Edit with Regexity are added at the end for padding.
    • A string with 3 characters less than a multiple of 4 will never be a result of this encoding method and is therefore not a valid base64 string.

Let’s set up an expression that can test for all this.

Regular Expression for Base64 Encoded Strings

To start with, let’s put all the allowed characters into square brackets. The characters allowed are A-ZEdit with Regexity, a-zEdit with Regexity, 0-9Edit with Regexity, and also the plus +Edit with Regexity and forward slash /Edit with Regexity symbols:

/[A-Za-z\d+/]/Edit with Regexity

The square brackets is a character range which means that we’ll accept any character inside it, in any order.

As we established earlier, we want these characters present in multiples of 4. So we’ll add a quantifier {4}Edit with Regexity behind the square brackets to start building this quantity. The {4}Edit with Regexity means we want exactly 4 of these characters.

/[A-Za-z\d+/]{4}/Edit with Regexity

To indicate that we’d accept any multiple of four, we need to group the expression above into a non-capture group (?:)Edit with Regexity and add a zero-or-more quantifier *Edit with Regexity behind the group to show that we’ll accept zero of more of this 4-character group.

/(?:[A-Za-z\d+/]{4})*/Edit with Regexity

The non-capture group does the work of grouping the 4-digit character without capturing it for later use (hence the name non-capture group). As this won’t be the final expression, there’s no need in capturing this first part now.

The expression above will now match any string containing the allowed characters in multiples of 4. Now we need to deal with the cases where the characters are not a multiple of 4.

The extra changes will only happen at the end of the string, so we’ll add the rest of the expression at the end of what we already have. First we’ll create individual expressions for handling each case and then combine them in the end.

Character number is 1 less than a multiple of 4

For the case where the number of characters is 1 less than a multiple of 4, let’s begin by specifying that we’ll accept 3 characters from our character group [A-Za-z\d+/]Edit with Regexity:

/[A-Za-z\d+/]{3}/Edit with Regexity

The {3}Edit with Regexity at the end indicates that we’ll accept only 3 characters, no more and no less.

To make up for the remaining character before we reach a multiple of 4, the string needs to have an equals sign =Edit with Regexity at the end as padding. We can specify this as follows:

/[A-Za-z\d+/]{3}=/Edit with Regexity

Character number is 2 less than a multiple of 4

This part works nearly identical to the part we just described. In this case, let’s begin by specifying that we’ll accept 2 characters from the character group [A-Za-z\d+/]Edit with Regexity:

/[A-Za-z\d+/]{2}/Edit with Regexity

To make up for the two remaining characters, the string must be padded with two equals signs ==Edit with Regexity, which we can add as follows:

/[A-Za-z\d+/]{2}==/Edit with Regexity

Adding it all together

Now that we have all the necessary parts to make up this expression, let’s combine them.

We’ll add the two special end cases behind the first part, group them in a non-capture group (?:)Edit with Regexity and use an OR symbol |Edit with Regexity between them:

/(?:[A-Za-z\d+/]{4})*(?:[A-Za-z\d+/]{3}=|[A-Za-z\d+/]{2}==)/Edit with Regexity

The non-capture group ensures that this part is not captured (and that we’d rather like to capture the expression as a whole). The OR symbol }Edit with Regexity shows that we’ll accept either of the two special cases at the end.

However, also need to add an optional zero-or-one quantifier ?Edit with Regexity at the end to show that the two special ending cases are optional.

/(?:[A-Za-z\d+/]{4})*(?:[A-Za-z\d+/]{3}=|[A-Za-z\d+/]{2}==)?/Edit with Regexity

And finally, to ensure that we match only the base64 string with nothing before or after it, we need to add the start-of-string ^Edit with Regexity and end-of-string $Edit with Regexity characters to the start and end of the expression, respectively.

/^(?:[A-Za-z\d+/]{4})*(?:[A-Za-z\d+/]{3}=|[A-Za-z\d+/]{2}==)?$/Edit with Regexity

And with that, our expression is complete.

Which Flags to Use

The expression given above will validate any given input as a base64 string. If however, you want to extract all base64 strings from a given piece of text, you might want to add the global flag gEdit with Regexity at the end to indicate that all instances of the expression can be matched. In this case, you’ll also need to remove the start-of-string ^Edit with Regexity and end-of-string $Edit with Regexity characters:

/(?:[A-Za-z\d+/]{4})*(?:[A-Za-z\d+/]{3}=|[A-Za-z\d+/]{2}==)?/gEdit with Regexity

Since base64 strings are case sensitive, using the case insensitive flag iEdit with Regexity will not be of much use here.

Shortcomings of This Expression

Note that the expression above will also accept an empty string as a valid base64 input.

Further Steps

Although the expression above can validate if a given string is in base64 format, it cannot validate the correctness of the string or decode it. This is beyond regex’s abilities, and is best to be done using your favourite programming language like Python or Javascript.

Sources

The regular expressions on this page were adapted from a solution presented on Stack Overflow, posted by Gumbo on this question. In addition, this tutorial presented a great overview of how base64 works, and the general format of an encoded string.

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

4 thoughts on “Regex for Base64 Encoded Strings”

  1. This is fantastic. Thank you for this. There is one thing I noted, however–you didn’t escape the \ for the literal character matches within your brackets, which leads to an error. When I modified it, it looks like this: ^(?:[A-Za-z\d+\/]{4})*(?:[A-Za-z\d+\/]{3}=|[A-Za-z\d+\/]{2}==)?$ .

    My use case was to validate whenever a user sent an authorization header value that wasn’t base64 encoded, and had to have an preceding “Basic ” This wound up looking like this: ^Basic (?:[A-Za-z\d+\/]{4})*(?:[A-Za-z\d+\/]{3}=|[A-Za-z\d+\/]{2}==)?$

    The only problem I ran into is that a null value is also a match for this regex 🙂

    Reply
  2. I think it’s missing the case with “3 less than a multiple of 4” isn’t it?
    /[A-Za-z\d+\/]{1}===/

    And IMO the non-capture group are not necessary, thus avoidable for a clearer answer.
    So a more complete regex could be
    ^([A-Za-z\d+\/]{4})*([A-Za-z\d+\/]{3}=|[A-Za-z\d+\/]{2}==|[A-Za-z\d+\/]{1}===)?$

    For base64url, we replace ‘+’ by ‘-‘ and ‘/’ by ‘_’ (see rfc4648 section 5) so you would write it like this:
    ^([\w-]{4})*([\w-]{3}=|[\w-]{2}==|[\w-]{1}===)?$

    Reply
  3. If one wants to exclude empty strings, it is needed to enforce a first character.

    The expression would thus come up like this:
    ^[A-Za-z\d+\/]([A-Za-z\d+\/]{4})*([A-Za-z\d+\/]{3}[A-Za-z\d+\/]{2}=|[A-Za-z\d+\/]==|===)$

    For base64url:
    ^\w([\w-]{4})*([\w-]{3}|[\w-]{2}=|[\w-]==|===)$

    Shortcoming: this would not accept not-padded strings (that are compliant with rfc4648)
    https://datatracker.ietf.org/doc/html/rfc4648#section-3.2
    If you need to accept them, it’s way easier, though:
    b64: ^[A-Za-z\d+\/]+$
    b64url: ^[\w-]+$

    Reply

Leave a Reply to Daniel H Cancel reply