Regex for Bitcoin Addresses

What regular expressions can be used to match Bitcoin addresses?

A regular expression for validating Bitcoin addresses must check for a leading 1 or 3, ensure that it contains 27 to 34 alphanumeric characters, and ensure it doesn’t contain any ambiguous characters like O or 0. It must also be able to validate newer Segwit addresses.

The regex code for the above-mentioned description looks something like this:

/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/Edit with Regexity

Let’s look at the thought process behind this code and describe it in detail.

General Format of a Bitcoin Address

According to this article, legacy bitcoin addresses are formatted using the following criteria:

This last rule ensures that someone typing after reading it will not confuse an uppercase letter OEdit with Regexity for the number 0Edit with Regexity, or confuse the uppercase letter IEdit with Regexity for the lowercase letter lEdit with Regexity and end up entering a totally different address.

Newer bitcoin addresses created by Segwit are formatted according to the following criteria:

Also Read: Regex for Ethereum Addresses

Regular Expression for Legacy Bitcoin Addresses

So with the format above we can start putting together our regular expression. For starters, the Bitcoin address has to start with the number 1Edit with Regexity or the number 3Edit with Regexity:

/[13]{1}/Edit with Regexity

The square brackets []Edit with Regexity allow us to match any of the characters inside it, in this case either 1Edit with Regexity or 3Edit with Regexity. The curly brackets {}Edit with Regexity behind the square brackets is a quantifier and ensures we only match 1 character in the square brackets. This avoids matches like 11Edit with Regexity, 13Edit with Regexity, 31Edit with Regexity or 33Edit with Regexity and ensures we match only 1Edit with Regexity or 3Edit with Regexity.

The rest of the bitcoin address can be any alphanumeric character, except uppercase IEdit with Regexity, lowercase lEdit with Regexity, uppercase OEdit with Regexity, or the number 0Edit with Regexity. Therefore, going through the entire alphanumeric spectrum and ignoring these characters, we can match any of the following:

This list above constitutes all the allowed characters in a bitcoin address. So let’s add them to our expression:

/[13]{1}[a-km-zA-HJ-NP-Z1-9]/Edit with Regexity

The square brackets again indicate that we can match any character within it.

Legacy bitcoin addresses can only be 27 to 34 characters long. But remember that we’ve already tested for a 1Edit with Regexity or a 3Edit with Regexity at the start of the address. That means the remaining characters can be between 26 and 33 characters long. Let’s add a quantifier behind the square brackets to specify that.

/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/Edit with Regexity

Finally, we need to ensure that we match bitcoin addresses without any characters before or after it, which would render the address useless. Thus we can add a start-of-string character ^Edit with Regexity and an end-of-string character $Edit with Regexity to the front and back of the expression, respectively. 

/^[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}$/Edit with Regexity

This constitutes the entire expression for matching legacy bitcoin addresses.

Regular Expression for Segwit Addresses

Now let’s look at matching Segwit Bitcoin addresses.

For starters, all Segwit addresses start with bc1Edit with Regexity. So let’s specify that.

/bc1/Edit with Regexity

Notice that we don’t use square brackets here since we want to match bc1Edit with Regexity exactly and not a different combination of the characters bEdit with Regexity, cEdit with Regexity, and 1Edit with Regexity, as would be the case with square brackets.

After this, the address could contain any lowercase letter or number. Segwit addresses are typically written in lowercase so there’s no need to exclude ambiguous characters. By using lowercase letters, chances of confusing an uppercase letter IEdit with Regexity with a lowercase letter lEdit with Regexity are eliminated, as are chances for confusing the uppercase letter OEdit with Regexity with the number 0Edit with Regexity.

Thus, a Segwit address may contain the following characters:

Thus, we add these inside a square bracket to our expression.

/bc1[a-z0-9]/Edit with Regexity

From what I could find during my research, Segwit addresses are between 42 and 62 characters long. But since we’ve already matched the first 3 letters as bc1Edit with Regexity, the remainder of the address should be between 39 and 59 characters long. Adding this as a quantifier:

/bc1[a-z0-9]{39,59}/Edit with Regexity

This will match any Segwit bitcoin address. To ensure we match only the address and not any other characters in front or behind it, add a start-of-string character ^Edit with Regexity and an end-of-string character $Edit with Regexity.

/^bc1[a-z0-9]{39,59}$/Edit with Regexity

Matching Both Types of Addresses

If you’d like to match both address types, we can combine the two expressions into one using a group, delimited with round brackets ()Edit with Regexity, and separated by a pipe character |Edit with Regexity.

/([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})/Edit with Regexity

This will match both the legacy Bitcoin addresses (before the pipe character) or the Segwit addresses (behind the pipe character).

If we don’t need to extract the bitcoin address, and only need to verify its validity, we can signal that the group should be a non-capture group by placing ?:Edit with Regexity at the start of the round brackets.

/(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})/Edit with Regexity

This will save some time by not having to keep the Bitcoin address in memory.

And finally, to ensure nothing is in front or behind the bitcoin address, we can once again add the start-of-string character ^Edit with Regexity and end-of-string character $Edit with Regexity.

/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/Edit with Regexity

And this is the final expression! This will match both legacy and new Segwit bitcoin addresses!

Which Flags to Use With This Expression

If you’d like to simply validate a given input as a Bitcoin address, you won’t need to use any flags. Bitcoin addresses are case sensitive, so it would not be correct to use the case-insensitive flag.

If you’re trying to extract all the Bitcoin addresses from a given block of text, you would want to use the global tag gEdit with Regexity to ensure you match all instances, and not only the first one. If this is the case, you should leave out the start-of-string and end-of-string characters as well.

/(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})/gEdit with Regexity

Other Considerations

Bitcoin addresses contain checksums that allow an additional check on their validity. However, this cannot be performed using regex along and must be checked programmatically.

Sources

The regular expressions on this page were adapted from solutions posted here and here. In addition, the following articles proved helpful in providing the correct format of a Bitcoin address:

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

Leave a Comment