Regex for BBCode

BBCode, short for Bulletin Board Code is a popular method of marking up text for formatting purposes in online message boards and forums. Even Whatsapp uses some version of BBCode to a limited extent. How does one use regular expression to process BBCode?

A regular expression for BBCode searches for a list of predefined tags enclosed in square brackets […], with possible attributes inside the tags. The list of tags depends on which tags have been implemented by the platform in question. These BBCode tags can then be replaced with valid HTML code if required.

Here’s a simple regular expression that will convert any set of BBCode tags (both opening and closing) to a corresponding HTML tag.

input.replace(/\[(.+).*\](.*?)\[\/\1\]/Edit with Regexity, "<$1>$2</$1>"Edit with Regexity);

However, it is rarely as simple as the expression above, because:

  • Tags vary widely across various platforms
  • They often include attributes and parameters to better define the tag

In this article, we’ll discuss BBCode regex in detail, and work our way up to creating a simple BBCode parser using just regex. We’ll take a look at the following

General Format of BBCode

BBCode is generally formatted in the following way:

  • In the simplest BBCode tags, text is enclosed between opening and closing tags which are indicated by square brackets (i.e. [b]text[/b]Edit with Regexity. The closing tag has a forward slash after the opening square bracket.
  • Some tags can be expanded to include simple attributes like the size of text (i.e. [size=12]text[/size]Edit with Regexity. The simple attribute is added inside the opening tag and separated by an equals sign =Edit with Regexity. It is, however, not repeated in the closing tag.
  • Some BBCode tags can be nested, such as table tags which are used to identify a table (e.g. [table][tr][td]Column1[/td][td]Column2[/td][/tr][/table]Edit with Regexity).

Let’s discuss how to convert various forms of BBCode into HTML code.

Basic Text Formatting Tags

The basic text formatting BBCode tags consists of the following general options:

  • [b]bold[/b]
  • [i]italic[/i]
  • [u]underline[/u]
  • [s]strikethrough[s/]

These are easily replaced with HTML using the following regular expression.

input.replace(/\[(b|i|u|s)\](.*?)\[\/\1\]/gsEdit with Regexity, "<$1>$2</$1>"Edit with Regexity);

The global gEdit with Regexity flag is included at the end of the expression to ensure all instances of this expression are covered, and not just the first one. The dot-all flag sEdit with Regexity is also included to ensure that the dot character .Edit with Regexity also matches newlines, since some of these tags might stretch over several lines.

Image Tags

BBCode images are displayed as follows, with the image URL between the two [img]Edit with Regexity tags:

  • [img]https://regexland/image.png[/img]

This needs to be converted to an HTML image tag with a proper source attribute. We can do this using the following expression:

input.replace(/\[img\](.*?)\[\/img\]/gEdit with Regexity, "<img src='$1' />"Edit with Regexity);

URL Links Tags

URL links are used to create a hyperlink to a certain website. These come in two flavours:

  • [url]https://regexland.com[/url]
  • [url=https://regexland.com]RegexLand[/url]

The first of these two will create a hyperlink and use the link itself as the anchor text (e.g. https://regexland.com):

input.replace(/\[url\](.*?)\[\/url\]/gEdit with Regexity, "<a href='$1'>$1</a>"Edit with Regexity);

The second will also create a link but will use the part between the opening and closing tags as the anchor text (e.g. RegexLand):

input.replace(/\[url=(https?:\/\/.+?)\](.*?)\[\/url\]/gEdit with Regexity, "<a href='$1'>$2</a>"Edit with Regexity);

Quote Tags

Quote tags also come in two shapes, one with an author and another without.

  • [quote]Every moment is a fresh beginning.[/quote]
  • [quote=”T.S Eliot”]Every moment is a fresh beginning.[/quote]

The first is handled by simply replacing the BBCode quote tags with HTML blockquote tags. You can exchange the blockquote tags for any other HTML entity you like.

input.replace(/\[quote\](.*?)\[\/quote\]/gsEdit with Regexity, "<blockquote>$1</blockquote>"Edit with Regexity);

For the second one, we can extract the author and show it in a line below the blockquote (you can style this the way you like it).

input.replace(/\[quote="(.*?)"\](.*?)\[\/quote\]/gsEdit with Regexity, "<blockquote>$2</blockquote><p>~ $1 ~</p>"Edit with Regexity);

List Tags

List tags can be used to create both ordered an unordered lists.

Bulleted lists are usually written in BBCode like this:

[list]
[*] Item 1
[*] Item 2
[*] Item 3
[/list]

This list can be converted to HTML using two expressions. The first expression converts the outer list tags to HTML unordered-list tags:

input.replace(/\[list\](.*?)\[\/list\]/gsEdit with Regexity, "<ul>$1</ul>"Edit with Regexity);

The second expression replaces all instances of [*]Edit with Regexity with an HTML list item tag.

input.replace(/\[\*\]/gsEdit with Regexity, "<li>"Edit with Regexity);

Note that the closing list item tag is optional in HTML, so there is no need to include one here.

Ordered lists are usually indicated by an =1Edit with Regexity at the end of the list tag. In this case, we can simply convert the outer list tag to an HTML ordered list tag:

input.replace(/\[list=1\](.*?)\[\/list\]/gsEdit with Regexity, "<ol>$1</ol>"Edit with Regexity);

Table Tags

In BBCode, tables are usually entered in the following way:

[table]
[th]
[td]Header 1[/td]
[td]Header 2[/td]
[/th]
[tr]
[td]Data 1[/td]
[td]Data 2[/td]
[/tr]
[/table]

This can be easily converted to an HTML table by simply converting all the table-related tags to their corresponding HTML tags. This can be done in one go with the following simple expression:

input.replace(/\[(\/?)(table|th|tr|td)\]/gsEdit with Regexity, "<$1$2>"Edit with Regexity);

This should all table [table]Edit with Regexity, table header [th]Edit with Regexity, table row [tr]Edit with Regexity, and table data [td]Edit with Regexity tags (both opening and closing) to their corresponding HTML counterparts.

Take note that the expression above does not check the validity of the table of the hierarchy of the tags involved. If the writer format his table structure incorrectly, the table will not display correctly, just like with general HTML.

Combining All Expressions

Once we’ve written expressions to cover all the BBCode supported by our platform, we can simply list them one after the other, and starting with the original text as input, pass the output of one expression into the next. In the end, once the original text has been through all the expressions, we’ll get an output that can be displayed as HTML.

var input = "[b]Text[/b] to be [i]converted[/i]";

input = input.replace(/\[(b|i|u|s)\](.*?)\[\/\1\]/gsEdit with Regexity, "<$1>$2</$1>"Edit with Regexity);
// ...
// Add additional expressions here
// ...
input = input.replace(/\[(\/?)(table|th|tr|td)\]/gsEdit with Regexity, "<$1$2>"Edit with Regexity);

document.write(input)

Matching All BBCode Tags

Sometimes the aim is to select all BBCode tags in a piece of text, perhaps to determine if the text contains BBCode, or to remove all BBCode tags from the text. The following expressions will match all BBCode tags, including ones that may not be supported.

To match all BBCode tags including the text between them, use the following regex code:

/\[(.+?).*?\](.*?)\[\/\1\]/gsEdit with Regexity

Notice the question mark ?Edit with Regexity after both the one-or-more +Edit with Regexity and the zero-or-more *Edit with Regexity quantifiers. These question marks makes the preceding quantifier lazy, causing them to match as few characters as possible. This prevents spanning of matches across multiple tags.

To match all BBCode tags (opening and closing) without the text in between them, the following regex will do:

/\[\??(.+?).*?\]/gEdit with Regexity

Sources

An answer posted on Stack Overflow by Matthew Flaschen on this question helped to identify the correct method of replacing with capture groups in Javascript.

Several article included examples of bbcode:

Benjamin

Founder, owner, and sole content creator on RegexLand. Enjoys programming, blogging, and teaching others how to do the same. Read more...

Leave a Comment