Considerations around writing efficient regex
A few ideas to keep in mind when writing regex. Please see Using Regular Expressions (regex) in SiteSpect first if you have not already.
Use Character Classes instead of Lazy wilds when possible
Let's say you have
<div class="structure prominent" data-myattr="1234">some content</div>
and you want to capture the classe(s), the following negated character class is more efficient
than using a lazy wild
or use something like this to match on an unknown number of classes and other attributes and text in the div tag.
which will be more efficient than using the lazy wilds. A negated character class works well when you know a single character that you want to match up until.
Don't use Lazy when the last possible match is more than halfway to the end
This comes into play when you want to match from a patter in the source of the page that is at the top such as
all the way to the bottom such as
in that case you would want to use a greedy regex to reduce unnecessary backtracking.
If you used a .*? it would work but cause additional backtracking.
How do I see and compare my regex performance in SiteSpect?
What other tools can help?
Other interesting reading