A Regular Expression to match all HTML tags

A regular expression to match all HTML tags, including surrounding white space, as well as clustered tags:

/(\s*<[^>]+>\s*)+/

For example,  this can be used in java (or android) code to split an HTML table and return the contained text in each cell as members of an array of strings :

 String tablecontents [] = htmltable.split("(\\s*<[^>]+>\\s*)+");

This uses the string splitter function String.split(regexp) which can split strings based on a given regular expression.  The fragments are returned as an array, which is quite handy.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.